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of point sets. Their probability distributions are a help in the analysis of the 
efficiency of the Quasi Monte Carlo method of numerical integration, which 
uses point sets that are distributed more uniformly than sets of independently 
uniformly distributed random points. In this thesis, generating functions of 
probability distributions of quadratic discrepancies are calculated using tech- 
niques borrowed from quantum field theory. 

The second part of this manuscript deals with the application of the Monte 
Carlo method to phase space integration, and in particular with an explicit 
example of importance sampling. It concerns the integration of differential 
cross sections of multi-parton QCD-processes, which contain the so-called 
kinematical antenna pole structures. The algorithm is presented and com- 
pared with RAMBO, showing a substantial reduction in computing time. 

In behalf of completeness of the thesis, short introductions to probability 
theory, Feynman diagrams and the Monte Carlo method of numerical integra- 
tion are included. 
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Chapter 1 
Introduction 



The cement for the subjects this manuscript deals with is the Monte Carlo method of numerical 
integration. Therefore, the first section is endowed with an introduction to its aspects relevant for 
the second section, which digresses on the main contents of this thesis. 
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1.1 Numerical integration 

Numerical integration is an approximation of the solution to an integration problem. An inte- 
gration problem consists of the task to integrate a function f, the integrand. Sometimes, the 
integral can be calculated analytically, but in most cases, this is not possible. Let us, for sim- 
plicity, assume that the problem can be reduced to that of the calculation of an integral on the 
s-dimensional hypercube K := [0,1]*. We denote the Lebesgue integral of the integrand f by 

(f) := f(x)dx . (1.1) 
Jk 

With numerical integration, this integral is estimated by a weighted average of f over a finite 
sample of N points Xic G K, that is, by 

N 

^w(xOf(xO ^ (f) , (1.2) 

k=1 
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where the numbers w(xk) are the weights coming with the particular method. Such a method is 
determined by the choice of the sample and the weights. 

In principle, the only restriction on a method to be acceptable is that, with the estimate of 
the integral, it should give an estimate of the expected error on the result. And of course, this 
expected error should not be too large. If a certain method cannot give an error estimate, it is 
useless. In practice, there is another restriction on a method to be acceptable, namely that the 
computational complexity it introduces should not be too large. It should be possible to do the 
computation within reasonable time. The computational complexity is due to the generation of 
the sample, evaluating the weights and evaluating the function values. Naturally, one expects that 
a result will become more accurate if larger samples are used, because then more information 
about the integrand is used. But if the evaluation of the function values is very expensive (time 
consuming), then one would like to use small samples, and indeed, there are methods that need 
smaller samples than other methods with the same accuracy. For these methods, however, the 
generation of the samples is more expensive. 

In the case of s = 1 , there are many acceptable and efficient methods. In most of them, the 
sample is chosen to be distributed evenly over [0, 1], i.e., all the distances between neighbors 
are the same and the whole of [0, 1] is covered. Different weights can be chosen, depending on 
the smoothness of the integrand. These methods give an expected error that decreases with the 
number of points as 1 /N where a > 0, with the general rule that a is larger for methods that 
can be applied to smoother integrands (cf. [1]). 

Conceptually, it is a small step to extrapolate these one-dimensional methods to more dimen- 
sions: the sample is taken to be the Cartesian product in the coordinates of the one-dimensional 
samples, and the weights are the products over the coordinates of the one-dimensional weights. 
Computationally, however, is it a large step, for the expected error decreases with N as 1 /N**/^. 
So to get an expected error that is of the "one-dimensional order" with N points, you need N * 
points. This small disaster is often called the "curse of dimensionality". 

A closer look at the choice of the samples reveals the cause of the curse. In one dimension, the 
even distribution of the points is the most uniform distribution possible. This makes the methods 
applicable to large classes of functions, because in the choice of the sample no knowledge about 
the integrand is assumed. As a result of this, the behavior with N of the expected error factorizes. 
In more dimensions, however, a Cartesian product of these one-dimensional distributions is not at 
all 'uniform'. The distances between neighbors in different directions are not the same anymore. 
Therefore, these methods can only be efficient for integrands that have the same kind of Cartesian 
symmetry. The error estimate, however, includes no knowledge about the integrand, and as a 
result of this, increases rapidly with the number of dimensions. 

1.1.1 Monte Carlo integration 

A popular remedy to the curse of dimensionality is the so called Monte Carlo (MC) method 
of numerical integration [15]. It is based on the belief that the points of the sample will be 
distributed fairly over K if they are chosen at random. To be more precise, the points are chosen 
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at random, independently and uniformly distributed, and the estimate of the integral of a function 
f is given by the unweighted average 

1 ^ 

lc=l 

With this choice of the samples, the estimate of the integral becomes a random variable, and 
probability theory can be applied to do statements about it (some relevant topics are reviewed in 
Section 2.1). For example, the expectation value of (f)j^ is given by 

1 ^ 

So the expectation value of the estimate of the integral is equal to the integral itself. The variance 
of (f)^ is equal to 

V((f)^) = E((f)^^]-E((f)^]^ = ^^2^-^ , (1.5) 

where f ^ just denotes pointwise multiplication of f with itself. This means that, if f is square 
integrable so that (f^) and V((f)^) exist, then we can apply the Chebyshev inequality, with 
the result that for large N, the estimate (f)^ converges to (f) with an expected error given by 
•\/V((f)^) . This is a very important result, for it states that the Monte Carlo method works in 
any dimension with the same rate of convergence, given by the 1 / A/N-rule. The only restriction 
is that f has to be square integrable. If this is not the case, the Monte Carlo estimate of an integral 
cannot be trusted. 



1.1.1.1 Error estimation 

In practice, one of course does not know (f^) — (f)^, so that it has to be estimated. This makes 
Monte Carlo integration a matter of statistics. A good estimator for the squared error is given by 

which satisfies E((f)^ ) = y[{i)^). To get more confidence in the result, an estimate of the 
squared error on the estimated squared error can be calculated with 

[4] (ON-4(f^)N(f)N+3(f^)^ _ 4N-6 / 

Wn • N(N-2)(N-3) (N-2)(N-3) V^^J ^ 

which satisfies E((f){^^) = V((f)f^^). Notice that (f^) has to exist in order to credit any value to 
(f){^ . One could, in principle, go on calculating higher errors on errors, but their significance 
becomes less and less, if they converge at all. 
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1.1.1.2 Sample generation 

Another question is how to obtain the random points. Monte Carlo integration is preferably 
done with the help of a computer, and there exist algorithms that produce sequences of numbers 
between and 1 that are 'as good as random' . They are called pseudo-random number generators 
(c.f. [2]). Since they are implemented on a computer, the algorithms are deterministic, and the 
numbers they produce cannot be truly random. The sequences, however, 'look random' and are 
certainly suitable for the use in Monte Carlo integration. Another drawback is that, because 
computers represent real numbers by a finite number of bits, the algorithms necessarily have a 
period, that is, they can produce only a finite number of numbers, and if they are recycled, they 
cannot be considered random anymore. Fortunately, modem random number generators such as 
RANLUX have very large periods, up to 1 0^^^. 

Finaly, the finiteness of a computer can cause problems when calculating a Lebesgue integral. 
In the foregoing, we stated that the Monte Carlo method is alway applicable if the integrand f is 
square integrable. For a computer, however, this is not enough. Consider the function 



fW := <^ , .„ ,„ (1-8) 




which has Lebesgue integrals (f) = (f^) = 0. A computer represents numbers with finite 
strings of bits, i.e., the numbers are always rational, so that a Monte Carlo estimate will always 
give (f ) N = 1 . Fortunately, this kind of pathological cases do not appear often in physical 
applications. 



1.1.2 Importance sampling 



The original problem is usually not that of the integration of a function on a hypercube. In 
general, it is the problem of integrating a function F on a more complicated manifold M. As we 
have seen before, the problem has to be reduced to that of integrating a function f on a hypercube 
K in order to apply the MC method. This is done with a map : K i— > M, and sometimes, an 
invertible map can be found in which cases we simply have 



M 



(Fo(p)(x]|J^(x]|dx , 



(1.9) 



K 



where ]^ is the determinant of the Jacobian matrix of cp, so that f(x) = (F o (p)(x)|J(p(x)|. 
In general however, this is not the case, and a suitable mapping cp : K i— > M and a function 
Qcp : K I— > R have to be determined such that 



K 



g,p(x]6((p(x] --y) dx = 1 , 



(1.10) 



where 6 is the Dirac delta-distribution on M (cf. [4]). The integral of F over M is then given by 

F(-y)dy = (f,p) with f,p(x) = (Fo(p)(x)g,p(x) . (1.11) 



M 
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If (p is invertible, then gcp(x) — |J(p(x)|. We just used the word "suitable" in connection with 
the determination of cp, and therefore in connection with the determination with the function i^, 
which is not unique. Importance sampling is the effort to choose such that (f^) — (f tp)^ is as 
small as possible, so that the expected error is as small as possible. The optimal choice would 
be such that it is zero, but this would mean that f (p(x) = 1 for all x e K and that the integration 
problem is solved analytically. In practice, f should be chosen as flat as possible. 

1.1.3 Quasi Monte Carlo integration 

The Monte Carlo method is very robust, but the 1 /a/N rate of convergence can be considered 
rather slow: to get one more significant digit in the result, 100 times more sample points are 
needed. The Quasi Monte Carlo (QMC) method tries to improve this behavior, by using samples 
the points of which are distributed more uniformly over the integration region than independent 
random points that are distributed uniformly in the integration region (cf. [3]). 

The previous sentence seems a bit paradoxical, but notice the difference between 'uniformly 
over' and 'uniformly in'. The latter is meant in the probabilistic sense: a random point is dis- 
tributed following a distribution in an integration region, which can be the uniform distribution. 
The former is meant for a set of points: the points can be distributed uniformly over the integra- 
tion region. In this case, the word 'uniformly' does not really have a meaning yet, and has to be 
defined, which is done by introducing measures of rates of uniformity. They are called discrep- 
ancies, and return a number Disi[Xn) for a sample, or point set, Xn = (xi , X2, . . . , Xn). The 
idea is then that, the higher the number Dm (Xn), the less uniformly the points are distributed. 

The task in QMC integration is to find low-discrepancy point sets. The integral of a function 
f is then estimated again by the unweighted average (f )n := Xic=i f (^k) over the point set. 
That this approach can indeed improve the convergence of the error is, for example, shown by 
the Koksma-Hlawka inequality, which states that 

l(f)N-(f)l < VHK[f]D*N(XN) , (1.12) 

where D^(Xn) is the so called star discrepancy of Xn, and VkkIA is the variation of f in the 
sense of Hardy and Krause. It is a complicated function of f that is, however, independent of the 
point set. This inequality states that the error, made by estimating the integral by an unweighted 
average over the function values at the points of the point set, decreases with the number of points 
N at least as quickly as the star discrepancy of the point set. 

1.2 Contents of this thesis 

The main contents start in Chapter 3, and can be divided into two subjects: the calculation of 
discrepancy distributions, and phase space integration with the emphasize on a special case of 
importance sampling. Chapter 2 reviews some topics from probability theory and formalism of 
Feynman diagrams. 
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1.2.1 Calculation of discrepancy distributions 

As discussed before, the relatively slow convergence of the MC method has inspired a search 
for other point sets whose discrepancy is lower than that expected for truly random points. The 
low-discrepancy point sets and low-discrepancy sequences have developed into a veritable indus- 
try, and sequences with, asymptotically, very low discrepancy are now available, especially for 
problems for which the dimension of the integration region is very large [16]. For point sets that 
are extracted as the first N elements of such a sequence, though, one is usually still compelled 
to compute the discrepancy numerically, and compare it to the expectation for random points in 
order to show that the point set is indeed 'better than random'. This implies, however, that one 
has to know, for a given discrepancy, its expectation value for truly random points, or preferably 
even its probability density (cf. Section 2.1.6). 

In Chapter 3, we introduce the formalism of the so-called quadratic discrepancies, and derive 
a formula for the generating function of their probability distribution. Furthermore, we give 
Feynman rules to calculate the generating function perturbatively using Feynman diagrams, with 
1 /N as expansion parameter. Chapter 4 digresses on the question whether the asymptotic series 
obtained is correct, and concludes affirmative for two examples of discrepancies, with great 
confidence in the general case. 

In [23, 24, 25] the problem of calculating the probability distribution of quadratic discrepan- 
cies under truly random point sets has been solved for large classes of discrepancies. Although 
computable, the resulting distributions are typically not very illuminating. The exception is usu- 
ally the case where the number of dimensions of the integration problem becomes very large, 
in which case a normal distribution often arises [22, 26]. In Chapter 5, we investigate this phe- 
nomenon in more detail, and we shall describe the conditions under which this 'law of large 
dimensions' applies. 

Throughout the discussion of Chapter 5, only the asymptotic limit of very large N is consid- 
ered, which implies that no statements can be done on how the number of points has to approach 
infinity with respect to the number of dimensions, as was for instance done in [26]. This problem 
is tackled in Chapter 6, in which the diagrammatic expansions of the generating function is given 
and calculated to low order for a few examples. For the Lego discrepancy, which is equivalent 
with a x^-statistic for N data points distributed over a number of M. bins, cases in which N as 
well as M become large are considered, leading to surprising results. Also the Fourier diaphony, 
for which a limit is derived in [26], is handled, leading to a stronger limit. 

1.2.2 Phase space integration 

A typical example in which the MC method is the only option is in the problem of phase space 
integration. It occurs in particle physics, where the connection between the model of the particles 
and the experiments with the particles is made with the help of transition probabilities (cf. [7]). 
These give the probability to get, under certain conditions, a transition from one certain state of 
particles (the initial state) to another certain state of particles (the final state). On one side, these 
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probabilities can be determined statistically, by performing an experiment several times, starting 
with the same initial state every time, and by counting the number of times certain final states 
occur. The probabilities can also be calculated from the model, and then two outcomes can be 
compared to evaluate the model. 

Phase space is the space of all possible momentum configurations of the final-state particles, 
and particle models predict probability densities on it. Because of the need of very high statistics 
for acceptable precision, it is usually difficult to determine them experimentally. A solution 
to this experimental problem is the creation of a mathematical problem: averaging transition 
probabilities over phase space. In the analysis of the experimental data this just means that final 
states, that differ only in momentum configuration, are considered equivalent. In the analysis of 
the model, this means that an integration of the probability density over phase space has to be 
performed. 

The actual quantity that physicists deal with is not the transition probability, but the cross 
section. If the number of initial particles is two, then it is the transition rate per unit of time, 
normalized with respect to the flux of the initial particles, i.e., the density of the initial particles 
times their relative velocity. The differential cross section da of a proces from a two particle 
initial state to a certain final state is given by 

da(i^f) = ^^|Mf^p6(pf-pi)df . (1.13) 

Vi 

In this expression, df represents the final state degrees of freedom that have to be integrated or 
summed in order to get the desired cross section ct. This includes the final-state momenta. The 
delta-distribution represents momentum conservation between the initial and the final states, and 
Vi is the relative velocity of the initial particles. The characteristics of the particular proces are 
contained in Mf _i , the transition amplitude or matrix element, and has to be calculated using the 
particle model in the formalism of quantum field theory. It determines the function that has to be 
intergrated over phase space. 

Besides momentum conservation, there are other restrictions the momenta of the particles 
have to satisfy, independent of the amplitude. Algorithms that generate random momenta, satis- 
fying these restrictions, are called phase space generators, and in Chapter 7 RAMBO is described, 
which generates momenta distibuted uniformly in phase space. This chapter also deals with some 
techniques that are useful for MC integration in general. 

For certain particle processes, the squared amplitude can have complicated peak structures, 
that make it hard to be integrated if the momenta are generated such that they are distributed 
uniformly in phase space. This is in particular true if it concemes processes in which the strong 
interaction is involved, for which the integrand contains peak structures that are governed by the 
so-called antenna pole structure. In Chapter 8, the algorithm SARGE is introduced that generates 
random momenta, satisfying the restrictions that are independent of the amplitude, and such that 
they are distributed following a density that containes the antenna pole structure. It improves the 
MC integration process through importance sampling. 



Chapter 2 

Probability, measures and diagrams 



Since this thesis is meant to be read by both theoretical physicists and mathematicians, this 
chapter elaborates on some subjects that are probably not everyday routine to the one or the 
other. This concerns probability theory, including a (very) short introduction to martingales, and 
Feynman diagrams. The hasty reader is advised to read at least Section 2.1.6 and Section 2.1.7. 
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2.1 Some probability theory 

We start this section 'at level zero' with respect to the probability theory, but expect the reader to 
be familiar with a bit of set theory, logic, measure theory, complex analysis and so on. For more 
details, we refer to [10], [11] and [12]. 
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2.1.1 Probability space 

A probability space consists of a triple [CI, 3^, P), where O is a set, 5" a a-field of subsets of CI, 
and P a probability measure defined on H. The collection 5" of subsets is called a <T-field if 

1. e ; 

2. F e ^ a\F e 3^ ; 

3. Fi,F2,...e^^UnFne9^ , 

where, in the last property, the number of sets Fn has to be countable. The probability measure 
P is a function 3^ [0, 1] with P(0) = 1 . We will only consider probability spaces for which 
£1 is a Lebesgue measurable subset of R^, n = 1,2,... and for which P is given by 



P(F) = 



P(cu]dcu , (2.1) 

F 



where dcu stands for the Lebesgue measure on O c R"^ and P is a function O i— > [0, oo) with 
J^P(a)) dtu = 1. P is called the probability density or probability distribution, although the 
latter name is more appropriate for the set of doubles {(F, P(F)) | F e 5}. 

A simple example of a probability distribution is the uniform distribution in [0, 1], for which 
P(a>) — 1. This is often extended to more dimensions, say n, by taking the Cartesian prod- 
uct of independent one-dimensional variables, that is, Pn(cu) = Y\^i=^ P(tf ''^'), where cu = 
(cu'^', cu'^\ . . . , tu<^5] e [0, 1]^ and P(tu'^') = 1 for all k. We say that cu is distributed uni- 
formly in [0, 1]^ 



2.1.2 Random variables 

A random variable X is a function on D.. It is an object about which statements TT can be made. 
These statements are then 'valued' with a number between and 1 by the probability measure. 
Probability theory concerns itself with the calculation of these numbers; their interpretation de- 
pends on the user. It can be a "rate of belief" (the Bayesian interpretation) or a ratio of outcomes 
in the limit of an infinite number of repetitions of experiments (the frequentist interpretation). In 
Monte Carlo integration, for example, the latter applies. 

Let n(X) denote a statement TT about X, and let Fn(x) := {cu e O | n(X(cu)) is true} be the 
subspace of O for which n(X(cu)) is true, then we denote 

P(n(X)):=P(Fn(x)) • (2.2) 

An important operator in the theory of probability is the expectation value E. It is the average of 
X over O, weighted with P: 



E(X] := 



X(cu)P(cu) dcu . (2.3) 

a 



2.1 Some probabUity theory 



11 



Especially expectation values of powers of X are often considered, and they are called the mo- 
ments of the probability distribution of X. This name anticipates the fact that a random variable 
has its own probability distribution, which is simply defined through 

Px(n(Z)) := P(Fn(z(x))) , (2.4) 

where Fn(z(x)) := {tu e O | n(Z(X(a)))) is true}. From now on, we will assume that X is real, 
and introduce the cumulative probability distribution or distribution function 

Fx(x) := P(X < x) . (2.5) 

Fx is a monotonously increasing function R i— > [0, 1]. Its derivative is the probability density Px, 
and we have 



Fx(x) = 



Px(t) dt . (2.6) 



Discontinuities in Fx are represented by Dirac delta-distributions in Px. An interesting observa- 
tion is, furthermore, that if X is distributed following Fx, then the random variable Y := Fx(X) is 
distributed uniformly in [0, 1], since P{Y < y) = P[X < T^\y)) = y. 

We proceed with a translation of confidence levels, given by P, into expectation values. This 
is done by the Chebyshev inequality, which states that, for a given number a > 0, 

P(|X| > a) < for any s > 1 . (2.7) 

a* 



Its proof is simple. We have 



P(|X| > a) = 



Px(t) dt + 



Px(t) dt < 



^Px(t]dt + 



K Px(t) dt , 



where the inequality holds because |t|/a > 1 under the integrals. The final expression is smaller 
than the integral over the whole of R, which is equal to the r.h.s. of Eq.(2.7). An example of its 
use is an estimate of the probability that a variable X will differ an amount a from its expectation 
value E(X). The Chebyshev inequality tells us that 

priY PrY^i^ , ^ E(|X-E(X)n V(X) 

P(|X-E(X)|>a) < 2 = — T- . (2-8) 

where 

V(X] := E(X^) - EiXf (2.9) 

is called the variance of X, and its square root a(X) := y^VpCj is called the standard deviation. 
So if we take a = c ■ ff (X) , then we see that the probability of |X — E (X) | to be larger than c • 0"(X) 
is smaller than 1 / c^. 

It is common not to consider the random variable itself, but standardized variable which is 
given by 

X-E(X] 



a(X) ■ 

It has its expectation value equal to zero and its variance equal to one. 



(2.10) 



12 



Probability, measures and diagrams 



2.1.3 Generating functions 

If X is real, its probability density can be calculated as follows. Let 9 denote the Heaviside 
step-function. It can, for example, be represented by the integral in the complex plane 



1 f e-^* 



dz , 



(2.11) 



where the contour F is along the line Rez = — e, and e is positive and small. If t > 0, then the 
integration contour can be closed to the right and the pole in z = contributes with a residue 
that is equal to — 1 . An extra minus sign comes from the orientation of the contour. If t < 0, then 
the contour can be closed to the left, giving zero. The probability distribution function Fx is then 
given by 



e(t-X(a)))P(cu)da) = — *- 
a 2m 



-zt 



r z 



gzx(cu)p^^^ dcu dz 



(2.12) 



The integral over £1 just gives the expectation value of e^^, which is called the moment generating 
function 



GxN :=E(e^>^) 



(2.13) 



It carries this name, because its derivatives in z = give the moments E(X"^] of X. In literature, 
the characteristic function is often used, which is just given by Gx(iz). The final result is that 
Fx is given by 



Fx(t) 



-1 
Ini 



-zt 



Gx(z) dz 



(2.14) 



We can translate this into a formula for the probability density Px by differentiation with respect 
to t. The result is that 



Px(t) = 

2m 



e-^*Gx(z)dz , 



(2.15) 



i.e., it is the inverse Laplace transform of the moment generating function of X. Notice that the 
generating function satisfies G(0) = 1, because the probability density Px is properly normal- 
ized: JfiPxlt) dt = 1. 

Another generating function that is often used, the cumulant generating function Wx, is 
simply given by Wx(z) = log(Gx(z] ). The first cumulant is equal to E(X) itself, and the second 
is the variance V(X). 

The generating function of the standardized variable can be expressed in terms of the original 
generating function Gx through 



E(gz(x-E(x))a(x)-i -) ^ e-^(^5'^(^'"'Gx(za(X)-i) 



(2.16) 
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2.1.4 Convergence of random variables and distributions 

Sequences {X^ | n = 1 , 2, . . . } of random variables are often considered in probabilistic analy- 
ses, and in particular their limiting behavior. Therefore, notions of convergence are needed, and 
we distinguish various types. First there is convergence in probability, and we write 

X^^X if Pn(|Xn-X| > £] ^0 V£>0. (2.17) 

With the Chebyshev inequality, we see that the requirement for convergence in probability is 
satisfied if there is a p > 1 such that EdX^ — X|^)/£^ — > for all e > 0. This observation 
suggests to introduce convergence in p*'^ mean, and we write 

X^jSx if EdXn-Xin^O. (2.18) 

The case of p — > oo can be considered special, and leads to almost sure convergence: 

Xn ^ X if Xn(a)) ^ X(a)) for all tu e a\F , (2.19) 

where F e 5" with P(F] = 0. To compare these notions of convergence, we note that (cf. [10]) 

X^^X =^ Xn^X , (2.20) 
Xn ^ X for some p > =^ X^^X . (2.21) 

Finally, there is convergence in distribution or convergence in law, and we write 

X^^X if Pn^P , (2.22) 

where the latter denotes weak convergence of the distributions Pn, of the variables X^. 

Pn ^ P if E(f (Xn) ) ^ E(f (X) ) for any bounded function f . (2.23) 

Notice that, in general, the moments of the variables X^ are not bounded functions. The gener- 
ating functions G-rv[z], however, are bounded for imaginary z. We actually have, (cf. [11]) 

Pn ^ P ■^=^ Gn(z] — > G(z) for each imaginary z. (2.24) 

The notion of weak convergence is also used in connection with distribution functions, and we 
write 

P if ^nb^) — > P(^) at all continuity points x of F. (2.25) 

Distribution functions are right-continuous and satisfy Fn(— oo) = and Fn(oo) = 1. Because 
F does not have to be a distribution function in case of weak convergence, it is useful to define 
complete convergence, and we write 

Fn ^ F if F^ ^ F and Fn(±oo] ^ F(±oo] . (2.26) 
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We note that weak convergence of a distribution is not necessarily equivalent with weak conver- 
gence of the density, but that (cf. [10]) 

Pn^P ^ Pn^P , so that Xn^X if Pn^P. (2.27) 

We end this section with the remark that Xn. — ^ X implies Xn. X (cf. [10]), so that 

Xn^X =^ Xt^-^X =^ Xn^X . (2.28) 

2.1.5 Martingales 

With a sequence of random variables should come a sequence of 0"-fields 3^ri- A sequence 
{Zn, 5'n I fi- = 1 , 2, . . . } is called a martingale if 

1. Zn is measurable with respect to 9^Ti ; 

2. E(|ZJ) < oo ; 

3. E(Zn|5'Tn) = -^.m with probability one for all m < n . 

The idea is that Zn depends on a number of kn variables (X>i that take their values in CI. In 
E(Zn|5'm)7 the first k^v < kn variables have to be taken fixed, and only the average over the 
remaining kn — k^ variables has to be taken. This average can then be considered to depend on 
the first knL variables again, and this dependence should be the same as the one of Z^n- 

A martingale is called zero-mean if E(Zn) — for all n. Furthermore, it is called square- 
integrable if E(Z^] exists for all n. A double sequence {Zn,t, 3^n,t M<^<1<ti)^ = 1)2,...} 
is called a martingale array, if {Zn,t, 3^n,i M < i- < l<^n} is a martingale for each n > 1 . The 
variables Xn,i := Zn,i — Zn,i-i are called the martingale differences. These are the ingredients 
needed for the powerful (cf. [12]) 

2.1.5.1 Central Limit Theorem: 

Let {Zn,i, 9^n,i M < i < l<Ti., Tt = 1 , 2, . . . } be a zero-mean, square-integrable 
with differences Xn,i, and suppose that 

max |Xn,il — > , 

i 

E ( max X^ ^ ) is bounded in n , 

i ' 

3'n^Q3'n+^,i for 1 < t < kn , n>^ . 

Then Zn.kn Z, where Z is a normal variable. 

A normal variable has a Gaussian distribution with zero mean and unit variance, given by a 
density P(t) = (27r)~^/^exp(— ^t^) and generating function G(z) = exp{jz^). 



martingale array 

(2.29) 
(2.30) 

(2.31) 
(2.32) 
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2.1.5.2 Adaptive Monte Carlo integration 

We apply this theorem to adaptive Monte Carlo integration, as a small exercise. It concerns the 
problem of calculating the Lebesgue integral (f)^ of a function f on an integration region O. 
Let gi , 92, • • • be a sequence of positive functions, where gi^ depends on k variables Xt that take 
their values in D.. We denote such a set of k variables by {x}ic := {xi , . . . , Xi^}. Assume that 
Jo 9ic(Wk) d-^ic = 1 for all values of {x}k_i, so that 

g{xk_, : xic ^ gic({x}k) (2.33) 

is a probability density in x^. Let us also introduce the functions 

gi(xi]---gk-i({x}k-i] , , 

dxT • • dxic_i . (2.34) 

o^-' gk Wkr 



gic"" : ^k 



In adaptive Monte Carlo integration, one generates a random point xi in O following a density 
g 1 , and with this point a density g{x}, is constructed to generate X2, so that q^tc^ can be constructed 
to generate X3 and so on. Then, one tries to estimate the integral (f)^ with 

The expectation value and the variance of (f)^ can easily be calculated, with the result that 
E((f)u) = (f)a and V((f)^) = V,[f]/n, where 

V^[f] := ^-^{9^'i')ci-{i)h ■ (2.36) 

k=1 

Monte Carlo integration is based on the observation that if (gi^V^)a exists for every k, so that 
Vrt[f] is a finite number, the Chebyshev inequality gives 



Vn[f] 



P(Kf)n-(f)al>e) < ^ =^ (f)n ^ (f)a , (2.37) 

which suggests to use (f)^^ as an estimator of (f)^, and to interpret Vn[f]/n as the square of the 
expected integration error. 

We shall prove^ now, that (f)^ converges to (f)^ with Gaussian confidence levels. Except of 
the existence of (gk^f^)a' we shall need some more requirements, but first let us introduce the 
variables 

2n,t:=^Xn,k , Xn,k:=^=== , ^^''^ ~ — ^^T-T ^ (Oa • 

Because we define the variables Zt^,! explicitly as the sum of the differences Xn,k> we are clearly 
dealing with a martingale array (with k^ = n) satisfying (2.32). It obviously is zero mean, and 

^This is a correction of the erroneous proof in the original thesis. 



16 



Probability, measures and diagrams 



it is square integrable by the requirement that (gi^^f^)o exists for all k. For the proof, we shall 
furthermore need the requirements that 

^ n n 

lim y E{X\] = and lim V |E(X?Xf) - E[X?)E(Xf]| = . (2.38) 

i=1 xi^ 

The first one is satisfied if E(X|] exists for all i, which can be translated in the demand that 
(g^^f^)^ and (gi^^f'^)o exist for all k. The second one puts a restriction on how strong the 
dependencies between the variables may be. This demand is, for example, satisfied if for every 
n there are numbers Knit, j ) such that 



gigi- • • gj-iff , , gi+T--gj_if? 

dxi • • • dxi 



Qi gj gj 



dXi+T--dXj < Kn(t,j) (2.39) 



and that satisfy lim^i-^oo ^^YJ]=i+^ Kn('i-) j] = 0- This is, for example, the case if K^(i, j) ~ 
1 j |. We prove along the line of argument as presented in [26], that the first three requirements 
of the theorem are satisfied. First we observe that the martingale is constructed such that 

I ^'^i.^' = ^ t. E(X?) (8r'f% - (f)fe ) = 1 . (2.40, 

SO that, for requirement (2.31), we have E( maXiX^^ ) < ^(^n,i) — ^ • For (2.29) we use 
the Chebyshev inequality to find that 

P(max|X^,|>e) < Y.P[\Xr.^>t) < ^^E(X;j = ]^,^, Z^i^t) , 

which goes to zero for all e > by (2.38). Requirement (2.30) goes the same way: 



n ^ Ti n 

P(| L ^n, - 1 1 > < 72 ( L E(X^,iXl,] - 2^ E(X^,] + l) 

i=1 i,j=1 i=1 

n 

+ ^(E{X^Xf)-E(X^)E(Xf))) , 

where we used (2.40) again. The final expression goes to zero for all £ > by (2.38). The result 
is that, because the variables Zn^n converge to a Gaussian variable with zero-mean and variance 
one, the random variables 



(f)n = ^J^^^,^+{i)ci (2.41) 

converge to a Gaussian variable with mean (f )q and variance Vn[f]/n. Note that for non-adaptive 
Monte Carlo integration, for which the densities gic are equal to a fixed density g for all k, the 
Levy Central Limit Theorem applies (cf. [10]) and only the existence of (f^/g)a is needed. 
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2.1.6 Hypothesis testing, qualification and discrepancies 

Probability theory is extensively used in the field of statistics. Statisticians try to derive probabil- 
ity distributions from empirical data, which are believed to be distributed following an existing, 
but unknown, distribution. In this section, some statistical procedures and their relevance to the 
main subjects of this thesis are discussed. 



2.1.6.1 Hypothesis testing 

One way to test a model of a physical system is by deriving from this model the probability dis- 
tribution according to which certain data from the system are supposed to be distributed. Then a 
test has to be developed, which measures the deviation between the probability distribution from 
the model, and the empirical distribution of the data. Because the actual probability distribution 
that the data seem to be drawn from is not known, this procedure belongs to the field of statistics, 
and it goes under the name of hypothesis testing. 

Let Xn = {xi , . . . , Xn} be a sample of physical data, Pn the probability density derived from 
the model (the hypothesis), and Tn the statistical test. In order for the test to be suitable, it should 
be developed such that, if Xn is distributed following Pn, then 

lim Tn(Xn) =0 . (2.42) 

N— >oo 

The idea is then that, for a finite number of data, Tn (Xn ) also has to be small if Xn is distributed 
following Pn. If Tn (Xn) happens to be to large, the hypothesis has to be rejected. The question 
is now: what is small or largel In order to answer this question, the probability distribution of 
Tn under Pn has to be calculated. The probability distribution function Fj.n is given by 



FT,N(t) := 



e(t-TN(cu) )Pn(cu) dcu , (2.43) 

On 



where On is the space the data Xn can take their values in. The generic shape of Ft,n and its 
derivative, the probability density, are depicted in Fig. 2. 1 . They tell us what the probability would 
be to find certain values for Tn(Xn] if Xn would be distributed following the hypothesis, i.e., 
they give the confidence levels. For example, we can read off the first graph that the probability 
for Tn(Xn) to be larger than 0.80 is about 1 — 0.95 = 0.05. This means that it is not very 
probable to find a value for Tn (Xn) this large, so that this number can be considered large. 



2. 1 .6.2 Qualification of samples 

Instead of for hypothesis testing, a test Tn can also be used to qualify a sample of data Xn- 
Suppose that there is a notion of good and bad samples, and that this notion is translated into 
the test Tn: if Tn(Xn) is small, then Xn is good, and if Tn(Xn] is large, then Xn is bad. 
A first question can be whether the test makes sense, and an answer can again be given by the 
probability distribution. Suppose that the information available about the source of the data leads 
to a probability density Pn according to which the data seem to be randomly distributed. If the 
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probability distribution function probability density 
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Figure 2.1: A probability distribution function and the probability density. 

probability density of Tn looks like the one in Fig. 2.1, i.e., if it goes to zero for small values of 
T]sj(Xisj), then the test makes sense. It means that the test is capable of distinguishing between 
good samples and the kind of samples that occur most often. The next question is then: what are 
small values of Tn (Xn)? The answer is that values are small if it is improbable to find them. 

2.1.6.3 Qualification of algorithms and discrepancy 

It can also be the case that the data come from an (expensive) algorithm that was specially 
designed to produce good samples (for example integration points for numerical integration), and 
the question is whether the algorithm makes sense. Suppose there is another (cheap) algorithm 
that produces data distributed with density Pn. The probability distribution of Tn determines 
the notion of smallness for the values of Tn (Xn ) again, and the expensive algorithm only makes 
sense if it produces samples with low values of Tn(Xim) that are improbable to find. In the 
mentioned case of numerical integration, good samples are the point sets that are distributed 
uniformly over the integration space, and the tests are called discrepancies (Section 3.1). 

Discrepancies have the structure of tests that measure the deviation between the empirical 
distribution of the point set, and the uniform distribution in the integration space. Algorithms to 
generate point sets following the uniform distribution (cf. [2]) can be considered 'cheap' com- 
pared to the special algorithms developed for numerical integration (cf. [16]). This seems para- 
doxical, since numerical integration asks for point sets that are distributed over the integration 
space as uniformly as possible (Section 1.1.3). The clue is that (random) point sets, generated 
following the uniform distribution, are not necessarily those that are distributed over the integra- 
tion space as uniformly as possible. A simple example is one-dimensional space. For a given 
number of points, the most uniform distribution possible clearly is the one for which all distances 
between the points are the same. However, if the points are distributed randomly following the 
uniform distribution, this situation will never occur. 

The example above gives a simple algorithm to generate good samples in one-dimensional 
space. Algorithms become 'expensive' if they have to be generated in more-dimensional spaces. 
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2.1.7 Calculation of probability distributions 

In statistics, probability distributions are used, and in probability theory, they are calculated. 
Part of this thesis deals with their calculation for discrepancies. The way this will be done 
is by calculating the generating function. The probability density can then be found using the 
inverse Laplace transform (Eq. (2.15)), which can, if necessary, be calculated through a numerical 
integral over a contour in the complex plane. 

The distribution is often calculated in certain limits, such as an infinite number of random 
variables or degrees of freedom. This is, in most cases, done because it simplifies the calcula- 
tion. These limits can, however, often be considered as the limiting cases in certain stochastic 
processes: in Monte Carlo integration, for example, the limit of an infinite number of integration 
points can be interpreted as the limit of an infinite run-time for a computer. 

If the generating function is considered, these limits correspond with weak convergence. 
However, if the generating function is calculated through all moments of the distribution, this 
corresponds with a stronger convergence: if z i— > X.^=o Civ^^/V^- ^ generating function and 

oo p 

E{Xl)^a^, p =0,1,2,... then G Jz) ^ }^ ^ , (2.44) 

p=0 ^' 

but the opposite does not have to be true. The moments might even go to infinity, while the 
generating function converges to an analytic function, and we will encounter an explicit example 
in which this happens. 
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2.2 Feynman diagrams and Gaussian measures 

Part of this thesis deals with the calculation of probability distributions of measures of non- 
uniformity of point sets {xi , . . . , x^} in an integration space. The measures that are considered 
can be written in terms of two-point functions as Xici^l'^kj'^i)- Consequently, the calcu- 
lation of the moments of the distributions involves the calculation of multiple convolutions of 
these two-point functions, and Feynman diagrams can be of help. 

2.2.1 Feynman diagrams 

Feynman diagrams are drawings obtained by connecting vertices following some rules. Let us 
illustrate this with an example, in which three vertices are connected to a diagram: 

V i • ^^-"^^^ 

The vertices have a number of legs and, in this case, there are two kinds of legs. The rule to 
get from these vertices to the particular diagram could be that legs of the same kind have to 
be connected. One rule that will always apply to cases we consider is that all legs have to be 
connected to other legs. Notice that, with this rule and the one that connected legs have to be of 
the same kind, the diagram drawn above is not the only possible one. Also 

O (2.46) 

is a permitted diagram. The vertices in the diagrams are connected by lines. The previous two 
diagrams we also call connected as a whole, because one can walk from any vertex to any other 
vertex over lines. An example of a disconnected diagram can be obtained with twice as many 
vertices: 

Y Y i i O^o 2^ • ^^•^'^^ 

This is a possible diagram if the previous rules are applied. 

If there are also rules how to assign a number to a diagram, these, together with the rules how 
to construct the diagrams, are called the Feynman rules. The Feynman rules make the diagrams 
of practical use. Certain calculations can be reduced to the assignment of numbers to a set of 
diagrams, which then have to be added to finish the whole calculation. We call such a number 
the contribution of the diagram, but this word shall often be omitted, and we will refer to 'sums 
of diagrams' instead of 'sums of contributions of diagrams'. In the following sections, we will 
give some examples, but first we derive 

2.2.1.1 A few general relations 

Let rii,ri2,ri3,... be elements of a commutative algebra over C, and assume that there is an 
operation ((•)) : Hiilici ^ ((Ili^ici)) ^ ^ which is linear over C. Suppose that every rjic 
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represents a type of vertex, and that 

(2.48) 

Pi! -pil-'-Pn! 

can be interpreted as the sum of all possible diagrams with pi vertices of type rji, pi vertices of 
type r\2 and so on. Then, the sum of all possible diagrams is given by 

f ^. i - (2.49) 

n p(n) 

where the second sum is over all partitions p(n) of n. Using the combinatorial rule of Eq. (2.91), 
which is derived in Appendix 2 A, and the linearity of ((•)), we find that this is equal to 

oo < oo 

I- ((^^^•••^O) = = ((exp(Xm)))-1 , (2.50) 

m=1 k] ,Vm m=1 Ic Ic 

so that 

G({ri}] := ]+ the sum of all possible diagrams = (( exp ^ ^ riic^ )) . (2.51) 

Ic 

Now, we show that the sum of all possible connected diagrams is given by log G({ri}). Define 

GjTlJ := ;i((Ti-exp(^Ti„))) , (2.52) 

SO that 

oo 

G({ti}) = ^GJtiJ . (2.53) 

n=0 

Gnhm] contains all diagrams with n vertices of type r\rn- The sum of all diagrams for which 
these n vertices are contained in the same connected piece is denoted Cn[riTrJ, so that 

where the sum in the r.h.s. is over all partitions p(n) of n. Using Eq.(2.91) again, we find that 



oo 



1 

G({ti}] = y [TinJ • • • CijTi J = exp(W[TiJ) , (2.55) 

TV' . -. 

n=0 1] ln = l 

where 

oo 

W[ti J := Y_ Cn[Tl J (2.56) 

n=1 

is the sum of all diagrams for which all vertices of the kind rim are contained in the same con- 
nected piece. Because we can take any kind of vertex for rj^a, the sum of all diagrams has to be 
given by the exponential of the sum of all connected diagrams, and we find that 

log G ({rj}) = the sum of all possible connected diagrams. (2.57) 
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2.2.2 Gaussian measures 



An example of the use of Feynman diagrams is in calculations with Gaussian measures. We refer 
to [4] for more details about the formalism used. 

We are going to look at measures on spaces C of real bounded functions on a subset K of R^, 

where s = 1,2, The Lebesgue measure on K we just denote dx. K can also be a finite set, 

in which case the Lebesgue integral becomes a finite sum. A measure on C will be denoted \x, 
and for, not necessarily linear, functional rii , r\2, • ■ • ,T]n on C, we denote 



(TllTl2 ■ ■ ■ T^n)^ : = 



(2.58) 



The space of continuous linear functionals on C is denoted C, and a typical member is the Dirac 
measure 6x, which is for every x e K defined by 



Furthermore, we introduce the so called n-point functions, which are given by 



Su(xi,X2, . . . ,Xn) := (6x1 6 



X2 



(2.59) 



(2.60) 



Notice that they are symmetric in their arguments. We will always assume that \l is normalized, 
so that So:=JcdM-[4'] = l- For hnear functionals, we will use the notation 



(2.61) 



K 



although 'r|(x)' cannot always be seen as a function value. For example, 6x(x) does not exist. If 
we combine this notation with the notation of Eq. (2.60), we can write 



(t1iT12 • ■ • rln)^ = 



Sn(Xl,X2, ... ,Xn)Tll(Xi]ll2(X2] • ■ -rinlXn) dXidX2- ■ -dXn . (2.62) 



The Fourier transform of a measure |J. on C is the function on C given by 

ri ^ (exp(tri))^ , (2.63) 

and \x is Gaussian if there is a quadratic form Q on C such that the Fourier transform is given by 

(exp(ln))^ = exp(-^Q[Ti]) . (2.64) 

Q can be written in terms of the two-point function, for take r\ := AC, where A is a real variable, 
and differentiate Eq. (2.64) twice with respect to A before putting it to zero. Then it is easy to see 
that 



m = (CO, 



(2.65) 



K 
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With this result, we can express the n-point functions in terms of the two-point function. If we 
take r| := Y.i=^ M^^i^ where At, 1 = 1 , . . . , n are real variables, and differentiate Eq. (2.64) 
once with respect of each of these variables before putting them to zero, we find that, for odd n, 
Sn = 0' and for even n that 

Su(xi,X2, ... ,Xn) = S2(Xi,Xj) , (2.66) 

pairs (i < j) 

where the sum is over all pairs (i, j) for which i < j. 



2.2.2.1 Diagrams 

The previous formula suggests to interpret Sil^i, Xj) as a line that connects the arguments Xi and 
Xj, so that the r.h.s. consists of all possible ways to connect the arguments xi , X2, . . . , Xn in pairs 
with lines. If there is a prescription to identify a number of m < n arguments, then they can 
represent a vertex. The number of arguments in a vertex, the number of legs, we call the order 
of the vertex. 

A typical case in which arguments are identified is when integrals of the following type are 
calculated. Let rji , r\2, ^13, • • • be a sequence of functional acting on C as 



rik(xi, . . . ,xic) (|)(xi) • • •c|)(xic) dxi • • • dxic . (2.67) 



Let, furthermore, ki , . . . , kra be a set of integers larger than zero, and denote k(i) = Y.]=^ 
The integrals we want to consider are given by 



Sic,^,(Mj<™')n..({x}J'")---ri.J{x^;:lJ dxvdx,,^, 



(2.68) 



where we use the notation {x}- = {Xj+i , Xj+2, • • • , Xt}. The set of arguments that are the inte- 
gration variables of the same r|i can be considered identical. As a result, the whole integral is 
given by the sum of all possible diagrams with the ra vertices of the orders ki , . . . , k^a- The 
contribution of a diagram is obtained by convoluting the two-point functions with the functionals 
r|iq in the vertices. 

The question we want to answer now is, given the Gaussian measure \i and the functionals 
r|k, what the sum of all possible diagrams is. Because S is symmetric, it suffices to consider the 
integrals 

«tif . (2.69) 

The diagrams that contribute have pi vertices of order 1, p2 vertices of order 2 and so on. In 
the set of diagrams that contribute to this integral, there are many diagrams that look exactly 
the same because they only differ in the exchange of integration variables of the same vertex, 
or in the exchange of vertices of the same order. We do not want to count them separately, and 
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therefor include in the contribution of a diagram the number of ways it can be obtained. We turn 
this number into a symmetry factor, by considering 

instead of (2.69). As a consequence, every vertex of order k accounts for a factor 1 /k!, and the 
set of vertices of order k accounts for a factor 1 /pic!- The contribution of a diagram is then given 
by the number obtained calculating the convolutions of the rik's, represented by the vertices, with 
the Si's, represented by the lines, multiplied with the symmetry factor. This factor is the number 
of ways the diagram can be obtained, considering all vertices and all legs of vertices distinct, 
divided by IliPtKi-O^S where Pi is the number of vertices of order i. We can use the results of 
Section 2.2. 1 now and find that 

oo 

1 + the sum of all possible diagrams = ( exp ( ^ ^ j ) ^ , (2.7 1) 

^ k=i ^ 

and that this equal to the exponential of the sum of all connected diagrams. 



2.2.3 Falling powers, diagrams and Grassmann variables 

Another, small, example of the use of Feynman diagrams is in the representation of the numbers 

:= N(N-l)(N-2)---(N-k+l) , N,keN. (2.72) 
It can be derived from the relation 

N 

1-1 )••• .i-ic=i TteSk 

where the second sum on the r.h.s. is over all permutations of (1,2, .. . ,k). This relation is 
derived in Appendix 2A. It allows for following diagrammatic interpretation. 

Consider 'arrowed' vertices of order two, that is, vertices of order two with distinct legs: one 
incoming and one outgoing. They can be connected with the rule that outgoing legs may only 
be connected to incoming legs and vice versa. The legs are connected with an 'arrowed' line, 
representing a ^i^, and the vertices represent the convolution X.i^=i ,12^12,^3 the two lines 
arriving at and starting from that vertex. Up to an overall minus sign, the r.h.s. of Eq. (2.73) is 
equal to the sum of all possible diagrams with k distinct 'arrowed' vertices, with the extra rule 
that every closed loop gives a factor —1 . The overall minus sign is equal to (— 1 )''. For example, 

(-,)3Ni = 000 + 0^0^ + O'O- + O'O^ + 36. + .63 

12 3 1 2 3 

= -N^ + 3N^-2N . (2.74) 

It is useful to consider diagrams that look exactly the same as one diagram again. In the ex- 
ample above, this applies to the last two diagrams, and to the second, the third and the fourth 
diagram. The extra number the contribution of a diagram has to be multiplied with is turned into 
a symmetry factor by considering (—1 ]'^N-/1<^! instead of (—1 )''N-. 
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2.2.3.1 Grassmann variables 



The numbers (— 1 )^N- can also be written in another way. We introduce 2N Grassmann vari- 
ables and i = 1 , . . . , N. They all anti-commute with each other and commute with 
complex numbers: 

^^^^+-^^^^ = , '^0\)^+^\)^^])i = , ^\)o\)^+^\)^^\>i = i, j = 1 , 2, . . . , N (2.75) 
cijji - -4)iC = , cij^i - -il^iC = i = 1 , 2, . . . , N , c e C . (2.76) 

These variables are nilpotent, i.e., ijjii|>i = "^(^i = for all i. Products of even numbers of 
these variables commute with all other combinations. Furthermore, we introduce the 'integral' 
of these variables, which maps sums of products of them onto C. It is linear over C and defined 
by the relations 



[d\ljd\|;]i|;i,\lJi2 ■ ■ -ilJi^ipjiipj^ • • -ipji := ifk,l<N , 



[d\l)d-i|j]-i|jiil)iil)2^p2---il^N^pN := 1 



(2.77) 



(2.78) 



Notice that the first integral is also zero if k > N, because then there has to be a pair (i|)t^ , il^t^ ) 
with ir = is in the product of i|)'s, so that il'v^'is = 0- The same holds if I > N. A useful 
calculation of such an integral is 

N 



[dii)dii^](2^ii)iii)i)^ = Y. 



[d\\)d\\)] '4»7r(1)'4'7r(1)'4'7t(2)'4'7t(2) ' " " '4'7t(N)4'7t(N) 



(2.79) 



i=1 ttsSn 

= ^(-l)^gn(7r)^ = (-l)^N! 

TteSN 

Another useful relation is the following. If A is a complex number, then 

[dij;d^l.] exp(-^^ji|;jA) = ^ [d^d^\>](^Y^,^\>i^ = A^ , (2.80) 

j=1 lc=0 ■ i=1 

since only the term with k = N is non-zero. Let us now introduce the 'measure' defined by 



N 

[di|)d\|j]f(i|Ji,... .ipN.^'i,... .iltN) exp . (2.81) 



Using the result of the previous calculations, we find 



i=1 



m=0 



-1 



m! 



k+m 



(2.82) 



i=1 



since only the term with m = N — k contributes. If we combine the two representations of the 
numbers f — )^N-, we can draw the conclusion that 



— ( ^ ^ U'i'ii'ij )a> — '^f possible diagrams with k 'arrowed' two-point vertices. 

(2.83) 



i=1 
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Probability, measures and diagrams 



2.2.4 Gaussian measures and Grassmann variables 

As a final application, we are going to combine the previous two examples. Let |j. be a Gaussian 
measure on a space of real bounded functions on a subset K of R^, let T|»i and i = 1 , . . . , N 
be a set of Grassmann variables, and let us denote 

((X)) := {{XU)^ = {{X)^U . (2.84) 
If ^1 ) Ti2, Ti3, . . . is a sequence of functionals acting on C as 

then 



rik(xi,... ,xi^](l)(xi) •••(!) (xk)dxvdxk , (2.85) 



Pi! -Vil 



(2.86) 



can be calculated using diagrams with p i vertices of type t] i , p2 vertices of type r[2 and so on, as 
described in Section 2.2.2. If we apply the results of Section 2.2.3, we see that 

^ ■ , — -j , P(n) --Pl +P2 + ---+Pn , (2.87) 

Pi- P2- • -Pn! 

can be calculated by attaching an incoming and an outgoing 'arrowed' leg to each type of vertex, 
and using the Feynman rules of Section 2.2.3. Furthermore, we see that 

N 

(-l)^'-'NZi:iI(tir Ti? • • • r]l- )^ = {{XV xf ■ ■ ■ Xl^ » , Xic :=Hk^iI)i^l^i • (2.88) 

i=1 

Each Xic represents a vertex of the kind tii^, with attached to it an incoming and an outgoing 
'arrowed' leg. Now, we can apply the relations of Section 2.2.1 to arrive at the result that 

G ({x}] '•= 1 + the sum of all possible diagrams with Xk-v^i'tices 

= ((exp(X|r))) ' (2-^9> 

k 

and that the sum of the connected diagrams is equal to log G ({xD- 
Appendix 2 A: Some combinatorial relations 

Consider a sequence f i , f 2, f 3, . . . of functions of integer arguments that are completely symmet- 
ric in those arguments. We want to establish a relation of the kind 

00 00 00 Pu(u) 

Y_ Z fm(ki,...,k^] = J^^fJTT^.lT^,... , n ) , (2.90) 

Ta=1 ki ,kTa=1 n=1 p(n) 



2.2 Feynman diagrams and Gaussian measures 



27 



where the second sum on the r.h.s. is over all partitions p(n) of n. Put like this, the relation is 
obviously incorrect, since on the l.h.s. all permutations frri(7t(1 ), 7t(2), . . . , 7t(m)) are counted 
separately, whereas on the r.h.s., only imi'^ ,2, . . . ,m) is counted. At first instance, it seems 
natural to correct for this by including a factor 1/m! on the l.h.s.. This is, however, too crude, 
because permutations of equal ki's are not counted separately. This can again be cured by in- 
cluding a factor Pi (n)! • Piln.)! • • •Pti(tl)! on the r.h.s., and we arrive at 



Pi in) Vl (tl) Vn (tl) 



V- V- '^vn[^^ ) • • • > ^m) _ V~ V~ fn(1 > • • ■ > 1 . 2, . . . , 2, . . . , n ) 

i-ik k 1 ^' "rifi PiN! • P2(n)! •••pn(n)! ' ^ 

Ta=l Ki ,...,ICTn=l Tl=l p(n) 

Note that V-Jj^) is equal to or 1 . 

Consider the k x k matrix A({i}''), depending on k integer variables {i}'' := {ii , ii, • • • > ik) 
that run from 1 to N. The matrix is defined by 

A..s({xf) := Ka. := \\ ''':Z'r (2.92) 

Every diagonal element of this matrix is equal to 1 for every configuration {i}, because ir = is if 
r = s. Now consider a configuration {i} for which all i's are not equal, except of one pair v = is 
with r 7^ s. Then Ar,s({i}'^) = As,r({i}^) = 1 , and we see that row r and row s are the same, so 
that det A({i}^) = 0. It is easy to see that this will always be the case if there are pairs i,. = is 
with r 7^ s. The number of configurations {i} for which all i's are not equal is precisely N-, so 
that we can write down the following identity 

N N 

^ detA({i}'^) = Y. , (2.93) 

il,...,iic=1 tl ,iic=l TTSSk 

where the second sum on the r.h.s. is over all permutations of (1,2,... , k). We just used the 
formula det A = X.7teSk sgn(7r) nr=i A-r,7t(r) to arrive at this result. 



Chapter 3 

The formalism of quadratic discrepancies 



Discrepancies are measures of non-uniformity of point sets in subsets of s -dimensional Euclidean 
space. They are interesting in connection with numerical integration, because the integration er- 
ror can be estimated in terms of the discrepancy of the point set used (Section 1.1.3). Their 
definition will be given in the first section of this chapter. An interesting feature of discrepan- 
cies is their probability distribution (Section 2.1.6), and large part of this chapter concerns with 
techniques, borrowed from quantum field theory, to calculate them for the so called quadratic 
discrepancies. In the last section, some examples of quadratic discrepancies are given. 
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The formalism of quadratic discrepancies 



3.1 Definition of discrepancy 

The only subspace of R^, s = 1,2,... that will be considered is the s-dimensional unit hypercube 
K := [0, 1]^ since, in practice, an integration problem can always be reduced to one on K. A 
point set Xn consists of N points xic e K, 1 < k < N. The coordinates of the points will be 
labeled with an upper index as x^, 1 < y < s. For an arbitrary subset A of K, we define the 
characteristic function such that 

fl if X e A 

^a(x) := (3.1) 
[0 if X ^ A . 

The integral of a function f , Lebesgue integrable on K, we denote by 

(f) := f(x]dx , (3.2) 
Jk 

so that the Lebesgue measure of a region A c K is given by (^a)- For every point set Xn, we 
introduce the estimate (f ) ^ of (f ) using Xn by 

1 ^ 

i=1 

3.1.1 The original definition 

Naturally, a discrepancy of the point set is defined with respect to a certain family A of measur- 
able subsets of K as follows 

dJJ^(Xn) := sup|(^a)n-(^a)| • (3.4) 

AeA 

It is the largest absolute error one makes if one tries to estimate the measure of every subset 
A G A by counting the number of points from Xn in the subset. The idea is that, if a point set 
is suitable for estimating the measures of all subsets well, so that d{^^(Xim] is small, then the 
point set must be distributed very "uniformly" over K. In order to arrive at a natural notion of 
uniformity, the family A of subsets has to be chosen sensibly. In principle, for every finite point 
set a subset of K can be found, such that the discrepancy takes its maximum value, which is 1 . 
A first restriction on the subsets one can for example take is that they have to be convex, i.e., for 
every Xi , X2 in A and every t e [0, 1] also txi + (1 — t)x2 is in A. 

This restriction still leaves many possible choices for the subsets, leading to different dis- 
crepancies (cf. [6]). An important example is the so called star discrepancy, denoted by D^, for 
which the family A* consists of all subsets 

Ay := [0,y']x[0,y^)x---x[0,yn , V eK . (3.5) 



It consists of all hyper-rectangles spanned by the origin and points y G K. For this discrepancy, 
various theorems are derived (cf. [6]), such as Koksma-Hlawka's inequality (Eq.(1.12)). In one 
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dimension, is equal to the statistic of the Kolmogorov-Smimov test for the hypothesis that 
the points are distributed randomly following the uniform distribution (cf. [2]). 

In order to proceed in a direction that leads to a definition of discrepancy that we will use, we 
introduce the V^-discrepancy. If we denote :— •&Ax, > then 



K 



1/p 



(3.6) 



It is the average over A* of the p"^ power of the error made by estimating the measures of the 
subsets using Xn- This definition assures the limit D^(Xn] = limp^oo d{^^(Xn]. Furthermore, 
it satisfies the bounds 



DS(XN)<D*N(XN)<c(s,p)DS^(XN)i^ , 



(3.7) 



where c(s,p] is independent of the point set [6]. For us, the case of p = 2 is in particular 



[21 

interesting. The expression for (Xn) can be evaluated further, with the result that 



N 



1/2 



(3.8) 



k,l=1 



where 

S(Xk,X;) = e(Xk,Xv) 

and 



K 



e(ij,xOdx/ + 



K 



K2 



e(iJi,-U2)dajidij2 , (3.9) 



e(xic,xv) = 



^v(xic)-&v(xOdv = nmin(l-x^,1-x7) 



(3.10) 



In this case of p = 2, the discrepancy is called quadratic, and is completely determined by the 
two-point function C. 



3.1.2 Quadratic discrepancies 

The quadratic discrepancy invites generalizations. The number !B(xi , X2) can be interpreted as a 
correlation between the points xi and X2, and the discrepancy is a function of the sum over all 
correlations in the point set. Various quadratic discrepancies can be defined by choosing differ- 
ent, and sensible, two-point correlation functions. The two-point functions can, however, also 

be interpreted differently, leading to another approach to quadratic discrepancies. This approach 

[21 

is based on the insight by H. Wozniakowski [17], that 0,^^ can be written as an average case 
complexity. We will demonstrate this here by constructing the probability measure with respect 
to which the discrepancy can be written as an average. For more details about the formalism, we 
refer to [4]. 
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The formalism of quadratic discrepancies 



Consider the Hilbert space H := L2(K) of (equivalence classes of almost everywhere equal) 
real quadratically integrable functions on K. We denote the inner product and the norm on H by 



(fh) :- 



f(x)h,(x) dx 



K 



^/{ff) for real f 



(3.11) 



Let us, as before, denote ^^(x) = Yl^=^ ^(V^ ~ ''^l let CP be the "primitivation" operator, 
defined by 



(yf)(x) := 



(3.12) 



K 



(P is a continuous linear map from H to the space Cw of continuous functions that vanish if 
any coordinate x^ = 1 . It even is a Hilbert-Schmidt operator: if [Un] is a basis of H, then 

The dual space C^, i.e. the space of all continuous linear functionals on Cw, consists of all 
bounded measures on [0, 1 For such a measure r\, we will use the notation 



tl[f] 



f(x]ri(x] dx , 



(3.13) 



K 



although "ri(x)" cannot always be seen as a function value. The transposed 7, which acts on C^ 
through the definition ( Tr)) [f] := r| [CPf], is then simply given by 



^x(v)Ti(-y) dy , 



(3.14) 



K 



and Trj is a bounded function. Notice that, because H is isomorphic to its dual, we can make the 
straightforward identification 7 = 7^ where is the adjoint of CP. There is a unique Gaussian 
probability measure \Xy^ on Cw which has Fourier transform 



Cw 



exp(iTi[(l)])d^w[*] = exp(-^((J'Ti)2)) 



(3.15) 



It is going to serve as the probability measure mentioned before. In literature, it is known as the 
Wiener sheet measure. By taking r| := \{\-[b^^ + Ai6x, ], where A, Ai , A2 are real variables and 
5x denotes the Dirac measure 6x[4^] := ^[^], and differentiating the above equation twice with 
respect to A, it is easy to see that (j.^ has two point function 



Cw 



(t)(xi)(lj(x2]d^w[*] = {"^K 5^6x2) 



j^min(l — x|, 1 — X2' 



(3.16) 



So the two-point function with which the discrepancy is defined, is the two-point function of the 
Gaussian measure |J.^ on Cw- Using this equation and Eq. (3.8) and Eq. (3.9), it is easy to see 
that 



Cw 



|((}))N-((t.)|'d^twM 



(3.17) 
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So we can identify the square of the discrepancy as the average case complexity, defined as the 
squared integration error averaged over Cw- 

The particular choice of 7 led to the Wiener sheet measure. In principle, any Hilbert-Schmidt 
operator can be used, leading to another Gaussian measure \x. We want to apply this generaliza- 
tion. For further analysis, it will therefore appear to be convenient to use the square, as it stands 
on the l.h.s., as definition for quadratic discrepancy. Furthermore, the definition of discrepancy 
is such, that it goes to zero, if the number of points in a uniformly distributed point set goes to 
infinity. This is immediately clear from inequality (1.12), the l.h.s. of which goes to zero if Xn 
consists of independent random points and N goes to infinity. In fact, Monte Carlo integration 
tells us that it goes to zero as 1 /^/N. Therefore, it seems natural to use N times the square of the 
original definition of the quadratic discrepancy, especially since we want to calculate probability 
distributions of discrepancies for large N. This multiplication with the factor N is equivalent 
with considering \/N times the average of N random variables (with zero mean) when applying 
the central limit theorem in probabilistic analyses. 



3.1.2.1 Definition quadratic discrepancy 

We conclude this section with the definition of discrepancy we will further use. Given a Hilbert- 
Schmidt operator IP on H := L2(K], there is a Gaussian measure on H with Fourier transform 

exp(^Tl[(l)])d^i3,[(|)] = exp(-l((3'Ti)^)) . (3.18) 

H 

In the case of the Wiener sheet measure, it even is a measure on a space of continuous functions, 
but for the general case this is not necessary. The operator CP should only be such, that it maps 
H continuously on a space C of continuous functions, so that there is a number p such that 
sup^gK |(yf)(x)| < p||f II for any f e H. In Appendix 3A, we show that in that case the Dirac 
measure can be properly defined under the measure p-y, which we will need. We shall omit the 
label H at the integral symbol from now on. 

We define the discrepancy of a point sets Xn in K as the quadratic integration error, made by 
using Xn, averaged under p-j over H: 



Dn(Xn) := N 



|(c|))N-((l))|'d^y[cl)] . (3.19) 



From now on, we will omit the argument Xn when we denote the discrepancy. Using this 
definition, the discrepancy can again be written in terms of two-point functions. With the Hilbert- 
Schmidt operator comes a two-point function 

ea'(xi,X2) := (^[xMx2)dyi^m = {n^,n^,) . (3.20) 

Because the combination — (c))) appears in the average, it is useful to introduce the notation 

:= (l)(x) - {4>) , (3.21) 
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and the reduced two-point function 



(^[xMx2) dyi^m , (3.22) 



which can be written in terms of like in Eq.(3.9). It has the important feature that it integrates 
to zero with respect to each of its arguments. The discrepancy is given by 

1 ^ 

Dn = TT^«a'(xk,xO , (3.23) 

lc,l=1 

i.e., as a sum over two-point correlations between the points of Xn. The correlation function 
is determined by the operator CP in this formulation. 



3.2 The generating function 

When Xn consists of uniformly distributed random points, then the discrepancy Dm is a random 
variable with a certain probability density. In [22, 23, 24], the generating function 

G(z) :=E(e^°^ ] (3.24) 

has been used to calculate it. We will also concentrate on the calculation of G(z]. Given G, the 
probability density can then be calculated by the inverse Laplace transform (Section 2.1.3). 

It will turn out that it is far to complicated to calculate G(z) analytically. For large number 
N of points, however, a series expansion in 1 /N can be made which can be calculated term by 
term. We intend to calculate the generating function from an explicit expression in terms of yirp, 
which we will now derive. 



3.2.1 The generating function as an average over functions 

First, we introduce the following bounded measure on K, which consists of a sum of Dirac 
measures, centered around the points of the point set, minus one: 

. N 

t1n(x] := ^Xt^-^W-""] • (3.25) 

The integration error of a function cf) and the discrepancy can be written in terms of T|n: 

(*)n-(*) =-iTlN[*] and DN = -N((yTiN]^) . (3.26) 
Using this expression for the discrepancy and the relation of Eq. (3.18), we can write 



exp(zDN) = expl-zN^yriN)^)] 



exp (y2zNT]^[(\)]^ d[ij,[(^] . 021) 
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If now the definition of tin is used, and the integrals over Xi, . . . ,Xn are performed on the 
l.h.s. and the r.h.s., we arrive at 



Gfzl 



(e9*)^d^y[ci)] , g 




2z 
N 



where we denote 



exp( gcl)(x) ] dx 



(3.28) 



(3.29) 



K 



3.2.2 Gauge freedom 



For the calculation of the generating function of the probability density of the discrepancy, there 
exists a freedom in the choice of the operator T with which the measure is defined, as we will 
show now. Let T act on H := L2(K) such that TT is a Hilbert-Schmidt operator on H that maps H 
continuously on a space of continuous functions. For each functional F on H there is a functional 
F o 1 which maps (j) G H onto F['J(t)]. We use this to define the measure [lyj, by 



F[(j)] d|^[(j)] := 



so that its Fourier transform is given by 



exp(^Tl[(l)])d^I^[(|^] = txp{-\{{77r])^)) , 



and its two-point function is given by 
If 7 is such that Tc)j = ^ for all cf), then 



G(z) = 



(e9*)^d^3,[(l)] = (e9*)^d^l^[(l)] , 



(3.30) 



(3.31) 



(3.32) 



(3.33) 



and we call this property the gauge freedom. It leads to a freedom in the choice of the operator ? 
with which the measure is defined, and we call these choices the gauges. Most gauge transfor- 
mations 7 we consider are global translations that are characterized by a functional t : Fl i— > R, 
and are given by [7^)[x) :— (^[x) + for all x. They trivially satisfy Tcj) = cf). 

An example of a gauge transformation that satisfies the criteria is simply given by Tcj) •= 4'- 
It results in the Landau gauge, for which all (j) satisfy (4>) = 0. This is, actually, the natural 
gauge to choose, because it restricts the analysis to functions that integrate to zero, so that the 
integration error becomes equal to the average of the function over the point set. The existence 
of the gauge freedom originates from the fact that the integration error is the same for integrands 
that differ only by a constant. The two-point function is equal to the reduced two-point function 
in the Landau gauge: = ^B-j^. 

From now on, we will omit the label 'CP' in the notation of the measures and the two-point 
functions. 
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3.2.3 The path integral, perturbation theory and instantons 

The connection of the foregoing with EucUdean quantum field theory is made via the path integral 
formulation. The emphasis is put on the use of perturbation theory. 

3.2.3.1 The path integral 

We want to express the generating function, as given by Eq. (3.28), in terms of a Euclidean path 
integral (cf. [8]). We have to introduce the free action, which is a quadratic functional 

:= , (3.34) 

and we arrive at the path integral formulation of the measure \i by making the identification 

d^im = [d(l)]exp(-So[(l)]) . (3.35) 

It is a formal expression, where [dcf)] represents the product over the whole of K of the "infinites- 
imal volume elements" d(l)(x). This is, of course, ill-defined, and it gets even worse since the set 
of functions (|), for which So[4)] is finite, in general has measure zero. 

One thing we want to be more precise about is the fact that, in first instance, is only 
well-defined on the image IPH of H := LilK] under 7, while this set has measure zero. The 
members of the subsets of H that do not have measure zero, however, usually do satisfy the 
boundary conditions imposed by We assume that these boundary conditions can be expressed 
by a finite number of linear equations 

Ti[(|)] =0 , i = l,2,... . (3.36) 

Then, the action should be extended as follows: 

= ^( +^^MtTt[c|)]2 , (3.37) 

i 

where Mi — > oo for all labels i. The "infinitesimal volume element" [dcf)] gets a factor a/Iti/Mi 
for every i in order for the measure to stay normalized to one. The extra terms in the action assure 
that the measure is zero if a function does not satisfy the boundary conditions. Notice that the 
action is still quadratic in cf). 

If we apply all this to the expression of Eq. (3.28) for the generating function, we find that 



G(z) = 

with an action S given by 



[d(l)]exp(-S[c|)]) , (3.38) 



Sm = So[<t>] - N log( e9*) , 9 = J^ . (3.39) 
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3.2.3.2 Perturbation theory and instantons 

In the action of Eq. (3.39), g appears to be a natural expansion parameter if N is large. An 
expansion around g = will automatically result in an expansion of the generating function 
around z — 0. Furthermore, it corresponds to an expansion of the action around 4> = 0. An 
expansion of the action to evaluate the generating function, however, only makes sense when it 
is an expansion around a minimum, so that it represents a saddle point approximation of the path 
integral. Therefore, a straightforward expansion such as just proposed, is only correct if it is an 
expansion around the minimum of the action, that is, if the trivial solution ^ = gives the only 
minimum of the action. General extrema of the action are given by solutions of the field equation 

(A(l))(x) + Ng-Ng^ = , (3.40) 

where A represents the self-adjoint operator (!P^^]^!P^^ including the boundary conditions Ti 
and possible boundary conditions coming from the fact that !P^^ is not necessarily self-adjoint ^ 
Depending on the value of z, non-trivial solutions may also exist. At this point it can be said that, 
because Acf) is real, non-trivial solutions only exist if z is real and non-zero so that g G R. In 
the analysis of the solutions we therefore can do a scaling cj^ (x] (^{x)/g so that the action for 
these solutions is given by 

m N S[I d.] = - + -(d.) - log(e*) . (3.41) 

These non-trivial solutions we call instantons (cf. [9]), although this may not be a rigorously 
correct nomenclature, in the field theoretical sense, for all situations we will encounter. Notice 
that instantons under different gauges only differ by a constant: if two gauges are connected 
by a global translation 7, and (|) is an instanton in the CP-gauge, then 7^ is an instanton in the 
TCP-gauge. The values of z for which they appear and the value of the action are gauge invariant, 
as can be concluded from Eq.(3.40) and Eq.(3.41). 

If N becomes large, then the contribution of an instanton to the path integral will behave as 
g-Nii4)]^ where Z[4>] does not depend on N (Notice that (^[x) does not depend on N because the 
field equation for these rescaled functions does not depend on N.). The e^'^^^*^-like behavior of 
the instanton contribution makes it invisible in the perturbative expansion around 1 /N =0. If 
1[4)] is larger than zero, this will not be a problem, because the contribution will be very small. 
If, however, 1[4)] is equal to zero, then the contribution will be more substantial, and it will 
even explode if L[^] is negative. Notice that, to be able to do make a perturbation series around 
^ — 0, the action has to be zero for this solution, for else the terms would all become zero or 
would explode for large N. 

The escape from this possible disaster is given by the fact that z has to be real and larger 
than zero for instantons to exist, and we want to integrate G(z) along the imaginary z-axis (Sec- 
tion 2. 1 .3). Also in the end, when we want to close the integration contour in the complex z-plane 

^Forexample, if y-^ct) = 4)' := ^ and T[4)] = ct){0), then ( 4))^ ) = (t)(1)ct)'(l) - (4) 4)"), so that the 
extra boundary condition is <|) ' ( 1 ) =0. 
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to the right, we will not meet the problem, because the function we want to integrate is an ex- 
pansion in z around z = that can be integrated term by term. Problems might only occur when 
instantons exist for values of z that are arbitrarily close to 0. We will confirm for a few cases that 
this does not happen. 



3.2.4 Feynman rules to calculate the 1 /N corrections 

We just suggested a straight-forward expansion in 1 /N of exp(— S) to calculate G perturbatively. 
This way, however, the calculation of the perturbation series becomes very cumbersome, and the 
reason for this is the following. We want to use the fact that an expansion in 1 /N corresponds to 
an expansion around 4> = of the part of the action that is non-quadratic in c|). The subsequent 
terms in the expansions are therefore proportional to moments of a Gaussian measure, and can 
be calculated using diagrams (Section 2.2.2). These diagrams, the Feynman diagrams, consist 
of lines representing two-point functions and vertices representing convolutions of two-point 
functions. Because the action is non-local, i.e. it cannot be written as a single integral over a 
Lagrangian density because of the logarithm in Eq. (3.39), the total path integral, thus the total 
sum of all diagrams, cannot be seen as the exponential of all connected diagrams, and it is this 
that makes the calculations difficult. 

In order to circumvent this obstacle, we first of all use the Landau gauge, so that cj) = cf) 
for all (j). Secondly, we introduce 2N Grassmann variables xpt and i = 1 , 2, . . . , N, as in 
Section 2.2.3, so that we can write 

r r 

( e^*)^ A\x.m = [A^AM exp ( M<\>\ ■ (3.42) 



i=1 



N 



G(z) 
If we now define 

TlJ*] := -g''(*'^) , Xk:=Tik^i])i-il)i , (3.43) 

i=l 

and 

N 

'f)) := ' ' " 



we can write 



[dTj^diW f ({x}) exp ( - ^ ^\>,^\),^ d^[<^] , (3.44) 

i=1 

oo 

G(z) = ((exp(}^|^))) , (3.45) 

1<:=2 ■ 

which has exactly the form of the r.h.s. of Eq. (2.89). Because the functionals r|i^ are also of the 
kind of Eq. (2.85), we can use the statement of Eq. (2.89), that G(z) is equal to the sum of the 
contributions of all Feynman diagrams that can be constructed with the vertices 

k , k>2 , (3.46) 
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and with the rules that all incoming legs have to be connected to outgoing legs and vice versa, 
and all dashed legs have to be connected to dashed legs. The lines in the obtained diagrams stand 
for propagators: 

boson propagator: x y = !B(x,t|) ; (3.47) 
fermion propagator: i — > — j = , (3.48) 

and to calculate the contribution of a diagram, boson propagators have to be convoluted in the 
vertices as Jj^ !B (y , xi ) !B (t| , X2) • • • 3 (t( , x^) dy , fermion propagators as X Jli 5ii ,j 6j ,12 , and then 
these convolutions have to be multiplied. To get the final result for a diagram, a factor — g'^ has 
to be included for every vertex of order k, and the symmetry factor has to be included. 

The contribution of the fermionic part can easily be determined, for every fermion loop only 
gives a factor — N . The main problem is to calculate the bosonic part. Furthermore, only the 
connected diagrams have to be calculated, since the sum of their contributions is equal to 

W(z) :=logG(z) . (3.49) 

Because every vertex carries a power of g that is equal to its order, the expansion in g is an 
expansion in the complexity of the diagrams, which can be systematically evaluated. 



3.2.5 Gaussian measures on a countable basis 

Because IP is a Hilbert-Schmidt operator on H := L2(K), is a self adjoint compact operator 
on H, and there exists an orthonormal basis {un} of H, consisting of eigenvectors of CP^ CP. If we 
denote the eigenvalues by ct^, then the eigenvalue equation is given by 



(ytyu^)(x) = 



e(x,-y)un(y)dy = <Un(x) . (3.50) 

K 

As the notation suggests, they are positive since < || Junp = {Un'y'^7u-n) = O"^. Notice that, 
because 7 is Hilbert-Schmidt, '^nll^^nP < 00, and this leads immediately to the spectral 
decomposition of 6, which is simply given by 

e(x,v) = ^(siu^ixMy) . (3.51) 

Tl 

In principle, the basis and the eigenvalues can be used as an alternative definition of a quadratic 
discrepancy. They naturally introduce the spectral decomposition of a two-point function and a 
reduced two-point function. The reasonable requirement of the existence of E(Dim) leads to 

E(Dn) = ^o-^(IKp-K)") < 00 , (3.52) 

n. 

which is satisfied if C comes from a Hilbert-Schmidt operator. If we denote the expansion of a 
function 4> e H by 

<^[X] = ^(l)nUn(x) , (t^neR , (3.53) 
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then the probability measure \i can be written as 

d^[^]=n '''^'/^"'^ d^n, cx.eR. (3.54) 

The basis functions will often be refered to as modes, originating from an example of a quadratic 
discrepancy (the Fourier diaphony), for which the basis is the Fourier basis without the constant 
mode. 

With different gauges come different bases and strengths. We call a gauge in which the basis 
is orthonormal a Feynman gauge. If the Landau gauge is used, in which (cj)) = 0, then the basis 
functions have to integrate to zero: 

«') = Vn , (3.55) 

where the label L indicates the Landau gauge. These functions are the solutions of the eigenvalue 
equation 

:B(x,-yX'(v]dx = <rf(x] . (3.56) 

Jk 

It will not always be possible to find the Landau basis. In terms of a basis that is not in the 
Landau gauge, !B is given by 

2(x,-y) = _^(J^(u^(x]-(u^)](Un(ij)-(Un)) . (3.57) 



3.3 Examples 

Some explicit, and well known, examples of quadratic discrepancies are introduced, and cast in 
the formalism of this chapter. 

3.3.1 The Ll-discrepancy 

In our definition, the L2-discrepancy is N times the square of the case of p = 2 in Eq.(3.6). The 
operator CP and the two-point function C are given by 



^y(x)(|)(v]dij , e(xi,X2) = n^^^^^l-'^T.l-xI) , (3.58) 



where ^y(x) := ri^=i 0(v^~x^)- The boundary conditions imposed by CP are given by 4)(x] = 
if at least one of the coordinates = 1 . The basis functions can now be found by solving 
the eigenvalue equation Jj^ C(x,ij]u(ij) — a^u(x). The equation factorizes for the different 
coordinates, and is most easily solved by differentiating twice on the l.h.s. and the r.h.s.. The 
one-dimensional solutions, that satisfy the boundary conditions, are 

Un(x^) = V2cos((n + ^)7rx^) , = 7r-^(n + i)-^ , n = 0, 1,2,.... (3.59) 
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The set {Un} clearly is orthonormal, and it is the basis in the one-dimensional case. For s > 1 , 
the basis and the strengths are given by all possible products 

s s 

Uft(x) = 2^/2ncos((n, + ^)7rx") , 4 = n-^' Yl^n^ + l)'^ ' ^3.60) 

where ft := {n-\ , rii, . . . , ris) and ruy = 0,1,2,... for -v = 1 , . . . , s. The reduced two-point 
function is given by 

s s s 



(3.61) 

In one dimension, the eigenfunctions and eigenvalues are 

u^^\x] = Vl cosinnx) , al^ = n^^n^^ , n=l,2,... . (3.62) 

For s > 1 , it is difficult to find all solutions to the eigenvalue equation, and we will address this 
problem in Section 5.2.2. 

3.3.2 The Cramer- von Mises goodness-of-fit test 

The L2-discrepancy is equivalent with the statistic of the Cramer-von Mises goodness-of-fit test, 
which tests the hypotheses that N data are distributed independently following a cumulative 
distribution function F (cf. [2, 18]). Consider, for simplicity, the one-dimensional case, so that 
xic e R, and denote (^2) := 0(x.i — X2) and (c|))n := N^^ X.k=i 4'(^k)- The statistic is given 
by 



Wi := N 



|(-&,)N-F(x)|^dF(x) , (3.63) 

R 

where we put the extra factor N again, just as in the case of the discrepancies. Because F is a 
cumulative distribution function, its inverse : [0, 1] 1— > R is uniquely defined, and we can 
re-write the statistic as 



|(^F-'(v))N-y| , (3.64) 

K 

where we denote K := [0,1]. But ■&F-i(y)(x) = ■&y(F(x)), so that is equal to the L2- 
discrepancy of the points F(xic). The interpretation of the statistic is slightly different from the 
L2-discrepancy, but the probability distribution is exactly the same. 

3.3.3 The Fourier diaphony 

For the Fourier diaphony, CP should impose periodic boundary conditions. In one dimension, a 
simple Hilbert-Schmidt operator that achieves this is given by 



(x) := ^ 

7t 



lci(x-v)cl)(-y)dy , ki(x) := 27r({x} - ^) , (3.65) 

K 
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where {x} := x mod 1 . The terai of a half in the integration kernel assures that Tcf) integrates to 
zero, so that the discrepancy is formulated in the Landau gauge from the start. The choice of the 
factors seems odd, but will appear to be the natural choice for the extension to more dimensions. 
The two-point function is given by 

S(xi,X2) = (5'6xiJ'6x2) = 1 -6{xi -X2KI -{xi -X2}] . (3.66) 

Notice that the two-point function only depends on xi — X2 and therefore is translation invariant, 
i.e., !B(xi + a, X2 + a) = !B(xi , X2) for all a e [0, 1]. As a result of this, all information about 
!B is contained in the function !Bi : x 1— > !B(x, 0), and we have 

S(xi,X2) = Si(xi -X2] . (3.67) 

The factor v^/tt was chosen such, that Si (0) = 1 . The set {tin} of solutions of the eigenvalue 
equation jj^'B{x,y)u{y] dy = a^u(x) is just the Fourier basis on [0, 1] without the constant 
mode: 

■u-2n-i(^) = \/2sin(27rnx) , umM cos(27rnx) , n — ] ,2, . . . , (3.68) 
with eigenvalues 

o-L-1 = o-L = 3 n-^n-^ , n = 1 , 2, . . . . (3.69) 

The function uq : x 1— > 1 is not a member of the basis because of the Landau gauge. Only 
functions that integrate to zero are present. 

In s > 1 dimensions, the operator J* is extended as follows. Let Q denote coordinate wise 
subtraction, then 



V(l+7t73)^-l 
with 



k,(xe-y)(l)(ij)dy , (3.70) 

K 



ks(x) := -l+nn+l<i(^")] • (3.71) 



The s-dimensional integration kernel is obtained from the one-dimensional one by adding the 
constant mode and taking the product over the coordinates. The extra term of —1 assures that the 
constant mode in s dimensions disappears again. The new factor assures that the s-dimensional 
two-point function is equal to one in the origin: 



^'W = (i+.v3)'-i i-'+n 



^=1 



1 + yBi(x^ 



(3.72) 



The basis in s-dimensions consists of all products over coordinates of the one-dimensional 
basis including the constant mode Uq. The only product that does not appear is, of course. 
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n^=i Uo(x^). The eigenvalue coming with uq is detennined by the choice of kg, and equal 
to 1 . The eigenvalues in s dimensions are just the properly normalized products of the one- 
dimensional ones. If we denote n = (ui , na, . . . , tis) and introduce 



lcv(n) 



^Ttv if TUv is even, 

J (rtv + 1 ) if TUv is odd. 



(3.73) 



then 

4 := a^(k(n)] := 



(1 +7tV3)^-U^r(kv)2 



n 



kv if kv 7^ 
1 ifkv = 



(3.74) 



The Fourier diaphony is often written in terms of the complex Fourier basis of K. Then, it attains 
the form 



D 



N 



1 

N 



N 

L 

1=1 



(3.75) 



where k ■ x := kix^ + kix^ + ■ ■ ■ + ksX^, and the first sum is over all k G except the constant 
mode k = (0,0,... ,0). Introduced as in this section, the diaphony is again N times the square 
of the definition as given in, for example, [19]. 



3.3.4 The Lego discrepancy and the x -statistic 

For the Lego discrepancy, the image C := IPH is a finite dimensional vector space. It is obtained 
by dividing K into M disjoint 'bins' A^, with U!^li -^n = K, and taking 

:= Y^dMid^c^) , (3.76) 

where 

^u:=V , w^:={K) , (3.77) 

and, in first instance, the strengths are not specified. 7 maps H onto the space of functions that 
are defined with a precision up to the size of the bins A^^. Notice that -^n'^m = 6n,m'&m where 
Sn,m is the Kronecker delta symbol, and that YJ^=^ "^u = 1 > YJ^=^ = 1 • The two-point 
function is given by 

M 

e(xi,x2) = ^(yiKMKUz) . (3.78) 

n=1 

Clearly, this model is dimension-independent, in the sense that the only information on the di- 
mension of K is that contained in the value of M: if the dissection of K into bins is of the 
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hyper-cubic type with p bins along each axis, then we shall have M. = p^. Also, a general 
area-preserving mapping of K onto itself will leave the discrepancy invariant: it will lead to a 
distortion (and possibly a dissection) of the various bins, but this influences neither vj^ nor (by 
definition) o^. Owing to the finiteness of M, a finite point set can, in fact, have zero discrepancy 
in this case, namely if every bin X^. contains precisely w^N points (assuming this number to be 
integer for every n). 

Because C is M-dimensional, it is easiest to formulate everything in R*^. We define 



4)^ := ((1)1,(1)2,... ,4>n] , 4>n = — (^nft") , 



(3.79) 



and divide H into equivalence classes by the prescription that 4) ~ (p if (|)^ = (p^. This space is 
C, and it is isomorphic to R*^ with inner product (p^) := X.n=i ^ri4>nVn- The operator CP 
restricted to R*^ is given by 



)n = 



(J"r 



/Wt 



4>n 



(3.80) 



Notice that J* is self adjoint. The Gaussian measure p. can now be defined rigorously in terms of 
a finite-dimensional path integral. If F is a functional on C, then 



F[cl)] MM 



M. 

[d(l)1F[^(l)n^,] exp(-So[(l)^]) , 



(3.81) 



n=1 



with 



M 



[d(i)i = n 



d(t)r 



and 



1 ^ 1 



(3.82) 



The two-point function and the reduced two-point function can be written in terms of matrices 
as 

M M 
e(xi,X2)- Y. en,m^n(xi)^m(X2) , S(xi,X2)= Y. ^n.m^nlxi )^nx(X2) , (3.83) 



rL,Ta=l 



n,m=l 



with 



M 



(3.84) 



k=1 



In the path integral formulation of the generating function, ( e^* ) occurs, and the series expan- 
sion of exp and the properties of the characteristic functions tell us that (e^*) = YJ^\ "^^nS^*"^, 
so that the generating function is given by 



/ M M \ 

[d(l)1 exp -So[(l)1-gN}^Wn(l)n + Nlog(^Wne9*-) . (3.85) 



n=1 



n=1 
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The discrepancy itself can be written as 

Dm = ^ ^ :B(xic,xv) = 1^ ^ Sn^n,mSm , where Sn := ^^n(xic] (3.86) 

lc,l=1 n.,Ta=1 lc=1 

is the number of points in bin A^^. 
3.3.4.1 The x^-statistic 

We did not yet specify the strengths a^, but we will in particular look at the choice for which 
CT^Wn = 1 for all n = 1 , 2, . . . , M.. In this case, C consists of functions in which the largest 
fluctuations appear over the smallest intervals. Although not a priori attractive in many cases, 
this choice is actually quite appropriate for, e.g. particle physics where cross sections display 
precisely this kind of behavior. The reduced two-point function attains the simple form iB^.m = 
6n.,Ta — 1 and the discrepancy becomes 

(3.87, 

which is nothing but the x^-statistic for N data points distributed over M. bins with expected 
number of points WnN (cf. [2]). 



3.4 Appendices 

Appendix 3A 

Let H := L2(K) be the Hilbert space of (equivalence classes of almost everywhere equal) real 
quadratically integrable functions on K, with inner product and norm 



f(x)g(x]dx , ||f||2 := ^/WO ■ (3.88) 

K 



A Hilbert space is self-dual, i.e. there is an isomorphism between H and its dual space H of 
continuous linear functions H i— > R. It induces an invertible mapping H 9 r] i— > f G H such 
that (fn, g) = ri[g] for all g e H, and we write ||ri||2 := W^Wi- 

Let CP be a Hilbert-Schmidt operator on H, and J* its transposed which acts on H through 
the definition Jri := rj o CP. It is easy to see that J* is a Hilbert-Schmidt operator on H and that 
||J>ri||2 = yy^lla exists for every r] G H. Furthermore, it is well known (cf. [4]) that there exists 
a Gaussian measure |a. on H with Fourier transform 



exp(in[f])d^i[f] = exp(-I||J'Ti|||) 

H 



(3.89) 
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By inserting Ar| where A is a real variable, and differentiating the above equation twice with 
respect to A before putting it to zero, one obtains the relation 



Tl[f]^d^[f] = II^Tllli 



H 



With [I, a Hilbert space LilH, |a.) can be defined, where the norm is given by 



Tl[f]"d^i[f] 



H 



1/2 



(3.90) 



(3.91) 



It is clear that a mapping jV : H i— > LalH, [j.) can directly be defined on the whole of H, but we 
need more: we want to apply it to Dirac-measures, which H does not contain. Consider therefore 
the Hilbert space Hj, which is the completion of H under the norm 



\7r[\ 



(3.92) 



7 can be interpreted as a continuous mapping from Hy to H, with THj = H. Furthermore, Af 
can be extended to the whole of Hj, since it is an isometry: 



VneHj: ||A/-tif = 



H 



(3.93) 

Now, suppose that T maps H continuously onto a space C of continuous functions, such that 

llJ'flU :=sup|(J'f)(x)| <p||f||2 forsomep>0. (3.94) 

xeK 

The dual space C of C consists of bounded measures on K, and containes the Dirac-measures. 
Then we have for every r\ E C and every f e H: 



|(yTi)[f]| = \^m]\ < IhlHIJ^flloo < PllTilHKIIi , (3.95) 

so that 7 maps C continuously onto H. Therefore, C C Hj, and Af can be applied to C. 



Appendix 3B 

For the proof that (3.94) holds in case of the Fourier diaphony, we use that there is obviously a 
number p such that 



V(l+7rV3]^-1 



< p for all X, y e K, 



so that 



|f(v)|dv forallxeK. 



K 



Now, we can apply the Cauchy-Schwarz inequality, with the result that for all x G K 



1 dy 



K 



1/2 /r \ 1/2 

\nv)\^dy) = vwni 

K / 



(3.96) 



(3.97) 



(3.98) 



Chapter 4 

Instantons for discrepancies 



It is mentioned in Section 3.2.3 that an expansion of the path integral representation of the gener- 
ating function of quadratic discrepancies around the trivial solution of the field equation is only 
correct, if this solution gives the minimum of the action. Furthermore, it is suggested that the 
non-trivial solutions, called instantons, might spoil the perturbation expansion if they exist for 
real values of the order parameter z of the generating function that are arbitrarily close to 0. In 
this chapter, we take a closer look at the issue for the Lego discrepancy and the L^-discrepancy 
in one dimension, and show that instantons exist but do not threaten the perturbative expansion. 

For the the L2-case, a method had to be developed to analyze the singularity structures of the 
solutions of implicit function equations with numerical help of a computer which is presented in 



Section 4.4. 
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4.1 An alternative derivation of the path integral formulation 

We start with an alternative derivation of the representation of the generating function as a path 
integral. For the Lego discrepancy, this goes as follows. We consider the case for which a^Wn, = 
1 for all n = 1 , . . . , M., so that the discrepancy is just the x^-statistic 

Dn = ^F— -N , where S^:= V^^(xO (4.1) 

n=1 lc=1 

is the number of points in bin n. If the points are truly randomly distributed, the variables 
Sn are distributed according to a multinomial distribution, so that the generating function is given 
by 

Ml / "s^ \ 

= Li;r^-?'-<"-p £X^-.N , (4.2, 

{Snl \ Tl=1 / 

where the summation is over all configurations {8^} which satisfy Y.^=^ §n = M- Notice that 
E( e^^^ ] > exp(zN /Wn — zN ) for every n, so that the generating function is not defined if 
N — > oo for the values of z with Re z > -^^bf log ^n- Using Gaussian integration rules and the 
generalized binomial theorem, it is easy to see that Eq. (4.2) can be written as 



M \ / M 



N 



E( e^^- ) = e-^ I n ) I L n ) I L ^-^'''^ I , (4.3) 



RM 



2 , . 

n=1 / \n=1 



with g = >/2z/N. By writing the N-th power as a power of e and substituting tin = 4'n + N 9> 
the path integral of Eq. (3.85) is obtained. 

For the L2-discrepancy in one dimension, CP^^cj) = 4>' •= ^ with the boundary condition 
that 4'(1 ) =0 (Section 3.3.1). The action is therefore given by 

= ^(((l)']') + ^M(t)(l)2-Nlog(e9*) , (4.4) 

where M — > oo. We show now that there is a naive continuum limit with this result. We use the 
fact that the discrepancy can be defined as the naive continuum limit of 

^U"' = NT H '^P Z Z - , (4.5) 



p=1 \k=l n=1 



where 

= n 3 , Wn , KP = e(n-p) , and ^1 = ^ ■ (4-6) 



is the discretized version of the L2-discrepancy, obtained when in Eq.(3.6) the average over 
a finite number of points tin, tl = 1 , . . . , M. is taken, instead of the average over the whole of 
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K. Notice that a whole class of 'discrete' discrepancies can be written as Eq. (4.5), by choosing 
different expressions for the and the cTp. Just like the Lego discrepancy, such a discrepancy 
can be written in terms of variables §ri that count the number of points in bin n: 



^ M M 

^n"^^ = N ^ Rnm§n§m-2}^T,§n + NU , (4.7) 

Ti,Ta=l n=1 

with 

MM M 
Rnm = ^pl^nK^ , = ^ RnmW^ , and U = ^ RnmWnW^ . (4.8) 

p=1 m^l n,m^^ 

In the case of the L2-type discrepancy, the matrix R is given by Rnm = niin(M. — n, M — 
m)/M.. The generating function is again given as the expectation value under the multinomial 
distribution. If we assume that the matrix R is invertible and positive definite, as it is for the 
L2-type discrepancy, use the Gaussian integration rules and the generalized binomial theorem 
and do the appropriate coordinate transformations, we find 



G{z) = 

with 



^L-Pf-SIt-Dd^. (4.9) 



^ M M M 

Sm = 2 T. Rnm*n*m + Ng^w^(|)^-Nlog(^w^e9*-) . (4.10) 

n,Ta=1 n=1 n=1 

For the L^-type discrepancy the inverse R~^ of the matrix R is easy to find and we get 

M M-1 

Y_ Kl^n4>m = M4>i, + MY_ (*n+1 - , (4.11) 

Ti,Ta=1 n=1 

SO that a naive continuum limit clearly produces Eq.(4.4). 



4.2 Instantons for the Lego discrepancy 

We start this section with a repetition of the statement that non-trivial instanton solutions only 
exist if z G [0, oo] (Section 3.2.3). In order to investigate the instantons, we analyze the action in 
terms of the variables Vn = 94>n + 2z, that is, we consider the integral J^m exp(— NI[v]) d'^ , 
with 

I[V] = z+^Y. n - log {Y_ Wne^") • (4. 12) 
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The sum is over n = 1 , . . . , M. We are interested in the minima of L. The 'perturbative' 
minimum ct^n = 0, n = 1 , . . . , M corresponds to Tjn = 2z, n = 1 , . . . , M, and general extrema 
of L are situated at points y which are solutions of the equations 

^[V)=0 ^ _ = V w^e^- , k=l,...,M. (4.13) 

If z is positive, e^^/y]^, and therefore y^^, has to be positive for every k. The result is that the y^ 
can take at most two values in one solution y (Fig. 4. 1). If they all take the same value, this value 
is 2z, and we get the perturbative solution. If they take two values, one of them, is larger that 
1 and the other, -y , is smaller than 1 . With these results, and the fact that Eq. (4. 13) implies that 



WnVn = 2Z , (4.14) 



we see that there are no solutions but the perturbative one if 2z < Wmin, where w^ia — minn^w^L. 

In the next section, the other extremal points will be analyzed and it will appear that minima 
occur with L[y] < 0. This means that, in the limit of N — > oo, the integral of exp(— Nl) is 
not defined; there is a 'wall' in the complex z plane along the positive real side of the imaginary 
axis, to the right of which the generating function is not defined. That this is not an artifact of 
our approach, can be seen in the expression of the generating function given by Eq. (4.2). It is 
shown there that the generating function is not defined if Re z > log Wn for any one of the 

We know (Section 6.2) that, at the perturbative level, the generating function has a singularity 
at z — J, but the instanton contributions cannot correspond with it, because they will appear 
already for Re z < j. However, in order to calculate the probability density H with the Laplace 
transform using the perturbative expression of G (z) , we can just calculate the contribution of the 
singularity aiz — j, for that is the contribution to the perturbative expansion of H(t). 
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4.2.1 The wall 

To expose the nature of the extrema of L, we have to investigate the eigenvalues A of the second 
derivative matrix A of 1 in the extremal points. This matrix is given by 

Am(ij] := ^^iv) = au(ij]6k,; + bk(-y)b;(v) , (4.15) 



with 



aav) = ^(1-Vu) and b„(ij) = ^ . (4.16) 



To show that I becomes negative, we only use its minima, and these correspond with extremal 
points in which all eigenvalues of A are positive. According to Appendix 4A, we are therefore 
only interested in cases where the degeneracy of negative is one, for else A = ai^ would 
be a solution. We further are only interested in cases where there is only one negative a^, 
for if there where more, say and ai^+i with Q]^ < ai<+i, then there would be a solution 
ak < A < aic+i < 0. So we see that the only extremal points we are interested in have all 
co-ordinates equal, or have one y ^ = y + and the others equal to t(_. If they are all equal, then 
they have to be equal to 2z, and for the extremal point to be a minimum 2z has to be smaller than 
1 . This is the perturbative minimum. Whether the other extremal points are minima depends on 
whether det A is positive in these points. The determinant can be written as 

de,A(t) = fl^„-V-)«-'(v.-1)(^ + ii^) . (4.17) 
Now we notice that all extremal points can be labeled with a parameter v by defining 

pV± (v) 



— — = with ve(l,oo) . (4.18) 

We see that y± is a continuous and differentiable function of v and we have that dv±/dv = 
V±/(D± — This parameterization induces a parameterization of 2z, and with the help of 
Eq. (4. 14) we see that 

d(2z) ^ w+y+ ^ (1 -w+)y_ 

dv V -^ ' 

So we see that the sign of det A is the same as the sign of d(2z]/dv: if an extremal point is a 
minimum, then d[2z] /dv > 0. The minimal value that v can take to represent a solution is 1 , 
which corresponds to t|+ = V- = 1 and 2z = 1 . It is easy to see that d(2z] /dv — > — oo if v J, 1 
and w+ < J, where w+ is the value of the weight belonging to the co-ordinate with the value 
V+. This means that if v starts from v = 1 and increases, then it will represent solutions with 
d(2z) /dv < 0, which are local maxima. We know that, if v — > oo, then y — > 0, t)+ — > cxd and 
2z = w+-y_|_ + (1 — w+)y _ — > oo, so that d(2z) / dv has to become larger than at some point. 
The first point where 2z becomes equal to 1 again we call Vc (Fig. 4.2), so 

z(Ve)=z(l) = i . (4.20) 
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Figure 4.2: L and z for instanton solutions parameterized with v. 



Also the function L itself can be written in terms of z(v) in the extremal points. We use that 

A[w+y2 +(1 -w+)-y2] = 4z + 4^ (4.21) 
dv dv 

and that w+y^ + (1 — = 1 if v = 1 , so that 



z v] + 



1 



Z V 



z(x) dx + 1 



1 



4z(v) 



V — log( 2z(v) 



(4.22) 



Now the problem arises. From the previous analysis of z(v) we know that, if 1 < v < v^, then 



z(v) < ;2 so that 



1 -v, + 2 



z(x) dx < 



(4.23) 



Furthermore, we find that 
1 



dl 
dv 



1 -4^(w+yi + (1 -w+)-y2) 



dz _ -w+(1 -w+)(ij+ -y_]^ dz 
dv ~ 



4z2 



dv 



(4.24) 



so that also dZ/dv < in Vc. So there clearly is a region in [1 ,Vc] where dz/dv > and 
L{v) < 0. This means that in the region ^Wmin < z < j there are instanton solutions with 
negative action. The situation is shown in Fig. 4.2 for Wmin = 0.09. A region where dz/dv > 
and S(v) < is clearly visible in [1 , Vc]. 
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53 



4.3 Instantons for the LJ-discrepancy 



In order to investigate the instantons for the L2-discrepancy in one dimension, we analyze L[(^] := 
S[4>/g]/N, with S as Eq.(4.4), because this new action does not depend on N: 



m = ^((ct)')') + lMclj(1)2-log(e*) , 



(4.25) 



where 2z = N and M ^ oo. Extremal points of this action are solutions of the field equation 



1 p4>M 
2z ( e* ) 







(4.26) 



that also satisfy the boundary conditions, which are (|)(0) = c|)'(l ) = at this point. We proceed 
however by applying the gauge transformation T : (]) i— > cj) — log( e* ), so that ( e^* ) = 1 and, 
in this gauge, the equation becomes 



2z 



, with ct)'(l) =0 and (e*) = 1 



(4.27) 



Integration over K of this equation leads to the identity 4)'(0) = 0. The problem is now reduced 
to that of the motion of a classical particle with a mass 1 / V4z in a potential 

0.7 



E 

0.0 



U(ct)) 



(4.28) 



-1.5 1.0 

and the solution can be written implicitly as 

dcp 



1 



(4.29) 



where the integration constant E, the energy, has to be larger than zero for solutions to exist. It 
is easy to see that the solutions are oscillatory and that, if ^[x] is a solution with one bending 
point, then also 



2 < X < 2±1 p even 

- ^ ^ , p=0,l,... ,k-1 , (4.30) 



4)(kx — p) 

ct)(1+p-kx) 2<x<E±lpodd 



is a solution for k = 2, 3, . . . . These new solutions have the same energy, but a larger number of 
bending points, namely k, and the value of z increases by a factor k^. Hence, we can classify the 
solutions according to the energy and the number bending points. This classification in terms of 
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X 10 X 10 X 1 

Figure 4.3: Instanton solutions 4)k(x) with E = 5.7 and number of bending points k = 1 , 2, 3. 



the number of bending points is quite natural and this can best be understood by looking at the 
limit of N —) oo. Then, the equation becomes 

-cl)"(x) -2z$(x) = , (4.31) 

with (j)(0) = cj)'(1)=0, and the solutions are given by 

4)^(x] = -^1 [1 -cos(k7rx)] , 2z = k^7r^ , k=l,2,... , (4.32) 

so that the instantons are completely classified with the number of bending points k. If N be- 
comes finite, these solutions are deformed but keep the same value of k (Fig. 4.3). For given k 
there are infinitely many solutions classified by E. 



4.3.1 Existence of instantons 

We now concentrate on the instantons with one bending point, because the numerical value of 
the action is independent of the number of bending points. Those instantons are completely 
characterized by their energy. The values of z for which these instantons exist are defined as a 
function of E by Eq.(4.29), which states that 

*+ 

d4) 



T(E) := V4z 



E-U(4)) 



(4.33) 



where 4)_ and (|)+ are the classical turning points. They are solutions of U(4)±) = E with 
c|)_ < < 4)+. In classical mechanics, T(E) is proportional to the period of a particle in the 
potential U (cf. [13]). 

The function T cannot be expressed in terms of elementary functions, but a number of its 
properties can be derived, as we shall now discuss. For small E, a quadratic approximation of the 
potential can be made with = ±\^2E with the result that 

limT(E) = nVl =^ limz(E) = . (4.34) 

no no ^ 
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(4.35) 



The question is now whether z is increasing as a function of E. To calculate T(E) for large E, 
U(())) can be approximated by —1 — (j) for cjj < and by e* for 4> > 0, so that 

2 

so T(E) is clearly increasing for large E. To analyze T(E) for small E, we make an expansion in 
powers of E. Therefore, we write 

V2E 



T(E ^oo) ^ 2VeTT + log (\/E + x/E^) , 



T(E) = 



(E-iv') '^'^[f(v)-f(-v)] dv , 



(4.36) 



where f is a continuous solution of the implicit equation 



f (v) -^={v^ , 



(4.37) 



with f (v) ~ V for small v. In Section 4.4 it is shown that it is given by the function values 
on the principal Riemann sheet of the general continuous solution and that is has an expansion 
f (v) = X.n=o '^nV^ with the coefficients an given by 



1 and ctn. 



1 



n+ 1 



k=2 



for n > 1 , (4.38) 



and with the radius of convergence equal to v 4n. If we substitute the power series into Eq.(4.36) 
and integrate term by term, we obtain the following power series for |E| < 2n: 



T(E) = Y. 

n=1 , n odd 

The first few terms in this expansion are 

2 



(4.39) 



T(E) = nVl 



^+T2 + 4V12 



139 
180 V 12 



571 
2880 V 12 



+ 0(E^) 



(4.40) 



The asymptotic behavior of the coefficients cXri. will be determined in Section 4.4, with the result 
that, for large and integer k. 



Otr 



1 



(47t) 2 n2 



X < 





if n 


= 4k 





if n 


= 4k+1 


' -2(-)^ 


if n 


= 4k + 2 




if n 


= 4k + 3 



(4.41) 



The results are summarized in Fig. 4.4. Depicted are the behavior for large E, the expansion for 
small E and a numerical evaluation of the integral of Eq. (4.33). Notice the strong deviation of 
the expansion from the other curves for E > 2n, the radius of convergence. For this plot the first 
50 terms were used. It appears that T is indeed an increasing function of E. 
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Figure 4.4: T(E) computed by numerical integration, as an expansion around E = and as an 
approximation for large E. The expansion is up to and including ©(E"*^). 



4.3.2 The wall 

We now turn to the analysis of the value of the action for an instanton. In the foregoing, we 
have shown for which positive values of z no instantons exist. Now we will show that the action 
indeed becomes negative for z positive and large enough. For an instanton solution with one 
bending point, the action is given by 



S(E) 



TifEl 



1 



4z(E) 



(^"{x) dx + 



ct)dct) 



ct)(x)dx = E + 2 



Ti(E) 
T(E) 



^E-U((l)) 



(4.42) 
(4.43) 



With the use of the same approximations for U(c()) as in the derivation of Eq. (4.35), it is easy to 
see that, for large E, Ti (E) is bounded by 



-:.(E + 1 



.3/2. 



21ogE 



3 Ve 



g(v^ + v/E^) , 



(4.44) 



so that S(E) clearly becomes negative for large E. 

To investigate the behavior of S (E) for small E, we use an expansion again. It can be obtained 
using Eq. (4.42) and the relation 



dE 



1 



■TfEl 



(4.45) 
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Figure 4.5: S(E) computed by numerical integration and as an expansion around E = 0. The 
expansion is up to and including ©(E"^^) and its radius of convergence is 2n. The curve for large 
E is the upper bound of Eq. (4.44). 



A derivation of this relation is given in Appendix 4B. For E X a quadratic approximation of 
the potential U(4)) can be used in Eq.(4.43) and we find that Ti (0) = 0, so that the expansion of 
T(E) can be substituted in Eq. (4.45) and an expansion of Ti (E) can be obtained by integrating 
term by term. The expansions of T(E) and Ti (E) can then be used to find the expansion of S(E) 
using Eq.(4.42). The first few terms are 



Ti(E) 
S(E) 



nVl 
E^ 



E 

~2 
E3 



E2 
16 



5E3 



24 ^ 432 



3456 
89 E4 



+ 



973 E^i 



2488320 



+ 0(E^ 



414720 



+ 0(E" 



(4.46) 
(4.47) 



In Fig. 4.5, we plot S(E) as obtained from the series expansion, from the asymptotic behavior, 
and from numerical integration. The conclusion is that S(E) is always negative. 



4.4 Computer-aided analysis of Riemann sheet structures 

In the previous section, we encountered the problem of finding solutions to the implicit function 
equation (4.37), or at least series expansions of solutions. It can be classified as a particular case 
of slightly more general problems one encounters in theoretical physics that are formulated as 
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follows: consider an entire function F : C i— > C such that 

ny]-v'^ as -y^O , (4.48) 

with nonnegative integer m (in practice, we have met cases with ra = 1 and ra = 2). The task 
at hand is then to find information about t| : C i— > C such that 

F(-y(x))=x^ . (4.49) 

In general, both the form of the series expansion of y (x) around x = and the nature of its 
singularities are of interest. Apart from Section 4.3, such questions arise, for instance, in the 
combinatorial problem of determining the number of Feynman diagrams contributing to given 
scattering amplitudes in various quantum field theories [27], in the statistical bootstrap model 
for hot hadronic matter (refs. in [28]), and in renormalization theory connected with the 't Hooft 
transformation [29]. An important and interesting example, studied in detail in [28], is the so- 
called bootstrap equation: 

h[v)=2v + ^-e^ , (4.50) 

which obviously has m = 1 . We shall consider functions F of the more general form 

Hv) = nv) + Q{v)e^ , (4.51) 

where P and Q are polynomials of finite degree dp > and dg > 0, respectively, with real 
coefficients. As our working example, taken from Section 4.3, we shall consider the function F^ 
defined as 

Fw(y) = -2-2v+2e^ , (4.52) 

for which m = 2. It is, in fact, closely related to the bootstrap equation (4.50): by substituting, 
in Eq. (4.50), y —> log 2 + y and x 2 log 2—1 — x^, we obtain Eq. (4.52). Its Riemann sheet 
structure, however, is quite different, as we shall see. We shall concentrate on the analysis of the 
Riemann sheet structure of those solutions of these equations that have a series expansion around 
X = 0. To determine the asymptotic behavior of these expansions, the nature of the singularities 
will be analyzed numerically. The results are justified by the fact that, in our calculations, only 
finite computer accuracy is required, as we shall demonstrate. 

4.4.1 Identification of the Riemann sheets 

As a first step we identify the various Riemann sheets by their value of ij(0): the sheet labeled 
s will have y[0) = Yj for that sheet. Obviously, x)(0] = is a solution with multiplicity m. In 
general, there will be dp solutions if Q (ij ) =0, and infinitely many if Q is non-vanishing. It 
will be helpful if we can identify the Riemann sheet on which pairs (x, y (x) ) lie when x is small 
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s 



Ys/7t 



1 

3 
5 
7 
9 



( 0.0000, 0.0000 ) 
(0.6649,2.3751 ) 
(0.8480,4.4178 ) 
( 0.9633, 6.4374 ) 
( 1.0478,8.4490) 
( 1.1145, 10.4567 ) 
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Table 4.1: The first few Riemann sheet solutions for F^lYs) = 0. 



but nonzero. This is indeed possible, and we shall illustrate it using T^. Let us write y = £, + iri 
with £, and r\ real numbers. We are then looking for solutions of Fw(^ + = 0, or 



Inspecting the left-hand side of the last equation, we can immediately see that its zeroes are quite 
nicely distributed. We can usefully enumerate them as Im(Ys) = Ug, where the sheet number 

s takes only the odd integer values ±1 , ±3, ±5, For positive s, the zero Ug is certainly 

located in the interval where sinUg > 0, i.e. (s — l]7r < Ug < stt, and u_s = — Ug. We 
have Ui = u i 0, and for increasing s the zero Uj moves upwards in its interval, until 
asymptotically we have Ug ~ Qs — (logas)/as with Qs = (s — ^ /2)n. In Tab. 4.1 we give 
the values of Ys for f^, for the first few values of s. Because the values fall in disjoint 
intervals, for small x we need to know y(x) only to a limited accuracy in order to be able to 
identify its Riemann sheet. The only nontrivial case is that of sheets —1 and 1, where it is 
sufficient to consider the complex arguments: for arg(x) — arg(y ) = we are on sheet 1, for 
I arg(x) — arg(t))| = tt we are on sheet —1. Again, limited computer accuracy is acceptable 
here, and for larger ra we simply have m different values of the argument, distinguished in an 
analogous manner. Note that of course the labeling of the sheets is rather arbitrary: we have 
chosen the odd integers in order to emphasize that both sheet 1 and —1 can be considered the 
principal Riemann sheet. For the bootstrap equation (4.50) it is more natural to label the single 
principal Riemann sheet with y (0) = as sheet number zero. 

4.4.2 Series expansion 

We want to compute y (x) as a Taylor series around x = 0: 




(4.53) 



(4.54) 




(4.55) 



n>0 



Obviously, ocq can be chosen as one of the Ug above. On principal sheets, with ao = 0, we also 
have immediately that cxi must be chosen out of the m possibilities with aj^ = 1 . The other 
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coefficients must then be computed (algebraically or numerically) by some recursive method, 
which we shall now discuss. 

It would be straightforward to plug the expansion (4.55) into Eq.(4.49) and equate the powers 
of X on both sides, but notice that, for Q non- vanishing, the number of possible products of 
coefficients grows very rapidly, so that the computer time needed to find the first N coefficients 
grows exponentially with N. As already mentioned in [28], the better way is to differentiate 
Eq. (4.49) with respect to x so that we obtain the nonlinear differential equation 

V'ix) [?'{V)Q{V) + [Qiv) + Q'(v))(x-- P(v])] = mx^-'Qiy] . (4.56) 

This equation yields a recursion relation involving products of at most dp + dg + 1 coefficients, 
so that a truncated power series can be computed in polynomial time. As an example, for we 
find the following differential equation: 

V'[x){x^ + 2v{x))=2x , (4.57) 

and the following recursion relation: 

aocxi = , 2oioOC2 + af — 1 = , 

n-1 

nocoOCn + (n — 2)(Xn-2 + 2'y papCXT^_p = , n>3 . (4.58) 

p=i 

We see immediately that y[x) is necessarily even in x if (Xo ^ 0, i.e. on the non-principal 
Riemann sheets. In that case, we also see that if oCn, n = 0,2, .. . is a solution, then also oc^, 
n = 0, 2, . . . is a solution, where the asterix stands for complex conjugation. This is a result of 
the fact that if y[x) is a solution of Eq. (4.52), then also v*(x*] is a solution. In practice, these 
solutions give the function values on the different Riemann sheets of one solution. The analysis 
of the previous section proves that y s(0) = y _s(0)* so that the solutions satisfy y* (x) = t|_s(x*) 
and the expansion coefficients satisfy 

oc^^^ = (cx(,-5)* . (4.59) 

On the principal Riemann sheets we have ao = and = 1 as mentioned, and the two solutions 
on sheet 1 and sheet —1 are related by v_i (x) = y^i—x). For y i (x) we find, finally: 



1 

OCrr = 



2{n+r 



n-1 

(n- 1)cxn-i +2^potpan+i-p 

P=2 



(4.60) 



for n > 2. Using this relation we have been able to compute many thousands of terms. The 
recursion appears to be stable in the forward direction, but we have not tried to prove this or 
examine the stability in the general case. 

In series expansions it is of course always important to know the convergence properties or, 
equivalently, the asymptotic behavior of as n becomes very large. In the next section, we 
therefore turn to the singularity structure of y (x). 
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4.4.3 Singularities and branches 

In order to find information about the singularity structure of y{x), we employ the techniques 
developed in [27], which we recapitulate here. Singularities are situated at those values Vi^ of y 
where 

F'(-Uk]=0 . (4.61) 

Since F is entire we also know that these singular points must form an enumerable set, i.e. we can 
find, and label, them as distinct points. We shall assume that these singularities are square-root 
branch points, for which it is necessary that 

^"{y^.) 7^ , (4.62) 

If F" vanishes at y^^ but does not, we have a cube-root branch point, and so on. If, for 
non- vanishing Q, all derivatives vanish (as for instance when ¥[y] = e^) we have, of course, 
a logarithmic branch point. We know that y — — oo corresponds to a logarithmic branch point, 
and it is to remove this to infinity in the x plane that we have required dp > 0. In our examples 
all the singularities at finite x will be square-root branch points. The position of the singularity 
in the x plane, xic, is of course given by 

F(Uk]=xr, (4.63) 

so that there are m different possible positions, lying equally spaced on a circle around the origin. 
We shall denote them by Xic,p with p = 1,2,... , m. Note that, in first instance, it is not clear at 
all whether X]^ p for certain k and p is indeed a singular point on a specific Riemann sheet. Later 
on, we shall describe how to determine this numerically. For values of x close to an observed 
singular point Xic,p we may expand the left-hand and right-hand side of Eq. (4.49) to obtain 

^(V -yic)'F"(yic) ~ rnHyv) - , (4.64) 

where we have dropped the higher derivative terms. Very close to the branch point we may 
therefore approximate y (x) by 

Note that there are only two possible values for |3k,p, and each singular point Xic,p goes with one 
or the other of these. Again numerical methods will help in determining which one of the two is 

the correct choice. 

We are now in a position to compute the asymptotic behavior of the coefficients oCn- To 
find it, we first determine, for a given Riemann sheet, which are the X)c,p that lie closest to the 
origin: this gives us the radius of convergence of the expansion of y (x) in that Riemann sheet. 
We then have to determine those p for which xic,p is actually a singular point. We shall do this 
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numerically, in the way described in the following section. Let us denote the set of values of p 
for which this is the case by P. Now, we may use the fact that 

where we have chosen that square root that is real and positive for 1 
asymptotic behavior of cXn as n — > oo must therefore be given by 

oc — 

Amongst other things, this provides a powerful numerical check on the accuracy of the as 
computed by the recursive technique. We shall now discuss how the singularity structure of our 
problem can be investigated numerically. 



X real and positive. The 
(4.67) 



4.4.4 Computer searches for sheet structures 

The main tool we use for our computer studies is a method for taking small steps over a Riemann 
sheet, that is, given the fact that for some value xi the point y i = y (xi ) is determined to belong to 
a certain Riemann sheet, we perform a small step Ax to a point X2 and find the point "y 2 = V (xi) 
on the same Riemann sheet. Our method to do this is nothing but Newton-Raphson iteration: we 
simply iterate the mapping 

until satisfactory convergence is obtained. The starting value for this iteration is just the point 
y^. A few remarks are in order here. In the first place, it must be noted that for this method to 
work, yi must be in the basin of attraction of yz- Since, except at the branch points, which we 
shall expressly avoid, y (x) is a continuous and differentiable function of x, this can always be 
arranged by taking Ax small enough. In the second place, the accuracy with which yi is actually 
a solution of Eq.(4.49) is not important as long as it is in the basin of attraction of yz'- therefore, 
there is no buildup of numerical errors in this method if we restrict ourselves to just keeping track 
of which Riemann sheet we are on. Finally, problems could arise if two Riemann sheet values of 
y for the same x are very close. But, since F is an entire function, we know that the solutions of 
Eq. (4.49) must either completely coincide or be separated by a finite distance, any inadvertent 
jump from one sheet to another can be detected and cured by, again, taking a small enough Ax. 

We have applied the following method for detecting and characterizing the various singular 
points. We start on a Riemann sheet si at a value x close to zero, and determine y (x) on that 
Riemann sheet. We then let the parameter x follow a prescribed contour that circles a selected 
would-be singularity Xic,p once (and no other singularities), and then returns to the starting point 
close to the origin. We then determine to which Riemann sheet the resulting y belongs. In this 
way we can find whether Xic,p is, in fact, a singular point for the starting sheet, and, if so, which 
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Figure 4.6: The numbering (k, p) of the singularities. 

two sheets are connected there. It is also possible, of course, to certify the square-root branch 
point nature of a singular point by circling twice around it, and checking that one returns to the 
original Riemann sheet. 

One important remark is in order here. In our tracking over the Riemann sheet, it is necessary 
that we do not cross branch cuts (except of course the one connected to the putative singularity). 
Since these branch cuts can be moved around in the complex x plane, the contour chosen defines 
the (relative) position of the branch cuts. The sheets that are said to be connected at a particular 
branch cut are therefore also determined by the choice of contour. Of course, choosing a different 
contour will change the whole system of interconnected sheets in a consistent manner, so that in 
fact, given one choice of contour and its system of sheets, we can work out what system of sheets 
will correspond to another choice of contour. We shall illustrate this in the following. 

Suppose, now, that x^^^ is one of the singular points on a certain sheet that is closest to the 
origin. We can then follow, on that sheet, a straight line running from xi close to the origin to 
a point X2 for which Xi/x^ ^ is real and just a bit smaller than one. Since Xi^ ^ is by assumption 
closest to the origin, there is then no ambiguity involved in determining which one of the two 
possible complex arguments of (3k,p we have to take. Thus, we can find all the information 
needed to compute the asymptotic behavior of (Xn on that sheet. 

4.4.5 An example 

Having established the necessary machinery, we shall now discuss a concrete example of our 
method. For this, we have taken the function of Eq.(4.52), which is closely related to the very 
well-understood bootstrap equation (4.50), as we have shown. Note that the origin x = 0, y = 
for Fw corresponds to the first singularity in Fb. 

4.4.5.1 The singularities 

The values of y (0) on the different Riemann sheets for F^, namely Yg for s = ±1 , ±3, . . . have 
already been discussed above. The singular values are simply given by 



F;(-y^)= 26^-^-2 = ^ -yk = 2i7rk, 



(4.69) 
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so that the possible singular points Xic,p satisfy 

x^,p = -4i7tk . (4.70) 

Note that k = does not correspond to a singular point. The positions of the possible singulari- 
ties in the complex x plane are therefore as follows. For positive integer k: 

X-k,i=Zk , x_ic,2 = -Zk , Zk = (1 +i]V27tk . (4.71) 
At all these various possible singularities, we have 

= Sink , (4.72) 

and therefore we may write 

fork>0: |3k,p = eic,p(l +i)V47r|k| , 

for k < : |3k,p = eic,p(l - i]y/4^\ , (4.73) 

where the only number to be determined is eic,p G {— 1 , 1 }. It must be kept in mind that the value 
of e depends of course on the sheet: we take the convention that we work on the sheet with the 
lowest number (in absolute value). When viewed from the other sheet, the value of e is simply 
opposite. 

4.4.5.2 The Riemann sheet structure 

We now have to discuss how the branch cuts should run in the complex x plane. There are two 
simple options (and an infinity of more complicated ones): in the first option (I), we choose to let 
the branch cuts extend away from the origin parallel to the real axis. This corresponds to tracking 
a contour that, say, first moves in the imaginary direction, and then in the real direction, to arrive 
close to the chosen singularity. The other option (II) is to take the cuts parallel to the imaginary 
axis, so that a contour that does not cross branch cuts en route first goes in the real direction, 
and then in the imaginary direction. Note that these two alternatives do, indeed, correspond to 
different implied relative positionings of the branch cuts. 

In Fig.4.7.a we show the contour used in examining singularity X2,i under option I. 

The contour starts on sheet number 1 close to the origin (so that y is close to Yi), moves 
upwards and then to the left, circles the singularity once anti-clockwise, and returns to its starting 
point by the same route in order to enable us to determine the resulting Riemann sheet number. 
Fig.4.7.b shows the corresponding path in the y plane. It ends again close to Yi so that, /or this 
choice of contour and its induced branch structure (indicated in the figure), sheet 1 does not have 
a branch point at X2,i. Fig. 4.7. c shows what happens if, instead of sheet number 1, we start at 
sheet number 3: the y track starts then at close to Y3, but ends up close to Y5, so that we conclude 
that sheets 3 and 5 are connected at X2,i . If we run through the whole contour twice, we get the y 
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Figure 4.7: Loops around under option 1. 
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k 






Xk,2 


ek,2 


^-k,i 


e-k,i 


X-k,2 


e-k,2 


1 


(1,3) 




(-1,3) 




(-1,-3) 




(1,-3) 




2 


(3,5) 




(3,5) 




(-3,-5) 




(-3,-5) 




3 


(5,7) 




(5,7) 




(-5,-7) 




(-5,-7) 




4 


(7,9) 




(7,9) 




(-7,-9) 




(-7,-9) 




5 


(9,11) 




(9,11) 




(-9,-11) 




(-9,-11) 





Table 4.2: The first few sheets and singularities (option I), and the corresponding value for e. 



k 


Xk,l 


ek,i 


Xk,2 


ek,2 


X-k,l 




X-k,2 


e-k,2 


1 


(1,3) 




(-1,3) 




(-1,-3) 


-1 


(1,-3) 


-1 


2 


(1,5) 




(-1,5) 




(-1,-5) 


-1 


(1,-5) 


-1 


3 


(1,7) 




(-1,7) 




(-1,-7) 


-1 


(1,-7) 


-1 


4 


(1,9) 




(-1,9) 




(-1,-9) 


-1 


(1,-9) 


-1 


5 


(1,11) 




(-1,11) 




(-1,-11) 


-1 


(1,-11) 


-1 



Table 4.3: The first few sheets and singularities (option II), and the corresponding value for e. 



track presented in Fig.4.7.d, where the y track ends up again at Y3 as expected for a square root 
branch cut. 

Under option 11, we rather use the contour indicated in Fig. 4.8. a, which first moves to the left 
and then upwards. Fig.4.8.b shows the resulting y path, which does not return to Yi but rather to 
Y5, indicating that under this choice of contour the sheets labeled 1 and 5 are connected at X2,i. 
Fig. 4. 8. c shows that, now, sheet 3 is insensitive to this singularity. 

In this way we have mapped the various singularities around the origin. In Tab.4.2 we present 
the pairs of sheets that are pairwise connected at the first few singularities, under option 1, and the 
observed value for e, which turns out to be —1 in all cases. We point out that at each singularity 
only two sheets out of all infinitely many are connected. Note the somewhat atypical situation 
at the lowest-lying singularities xi_±i and x_i_±i. The alternative option II results in Tab. 4.3. 
Note that the higher-lying singularities now show a sheet structure similar to the lowest ones. In 
fact, this is the choice that corresponds most directly to the analysis of the sheet structure of the 
bootstrap equation in [28], with of course the extra complication in the fact that the bootstrap 
equation (4.50) has m = 1 while for F^, m = 2. Note that, once again, e = — 1 in all cases. 



4.4.5.3 Asymptotic behavior of the series expansion coefficients 

We shall now illustrate how the information on the Xk,p and |3k,p allows us to compute the asymp- 
totic behavior of the series expansion coefficients cXn- 
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Figure 4.9: Tn, defined in Eq.(4.76), as function of logn. 



First Riemann sheet. In this sheet, the singularities closest to the origin, and their correspond- 
ing |3's are 



(4.74) 



^1,1 = v47rexp(3i7r/4) , |3i j = v87rexp(— 3i7r/4) , 
^-1,2 = V47rexp(— 3i7r/4) , (3_i_2 = V87rexp(3i7t/4) . 

Using Eq.(4.67), we see that the asymptotic form of the coefficients on sheet 1 is given by 



(x: 



asym 



asym 



n3/2(47r)V2 ' 
/ 3rL7r 3n 



-V2 



cos 



V 4 



4 



(-)P n = 4p 

n = 4p + 1 

(-)P+i rL = 4p+2 

(-)pV2 rL = 4p+3 



with integer p. In Fig. 4.9 we have plotted the observed behavior of 



log 



(471)^/^/2 



asym 



(4.75) 



(4.76) 



on the first Riemann sheet, against logn. The coefficients clearly converge to the computed 
behavior, and we can even distinguish that the leading corrections go as n^^^^; the four separate 
lines that emerge are just the four different forms of Cn- The series expansion for Riemann sheet 
— 1 are simply obtained from 



Oil 



(4.77) 
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Higher Riemann sheets. We first consider positive sheet label s = 3,5, 7, . . . and put k = 
(s — 1 )/2. We then have 

xic,i = -Xk,2 = A/47t1cexp(i7t/4) , = |3ic,2 = (1 + i)V47tk . (4.78) 

As we have already seen a-n vanishes for odd n, and for even n we have the following asymptotic 
form: 

for integer p. For negative s, we use Eq. (4.59), which also holds asymptotically. 
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4.5 Conclusions 

For the L2-discrepancy and the Lego discrepancy, we have addressed the problem that non-trivial 
extremal points of the the action in the path integral representation of the generating function of 
quadratic discrepancies, called instantons, might spoil the saddle point approximation around the 
trivial extremal point. We have shown that instantons appear in both cases, but only if the order 
parameter z of the generating function G is larger then a certain positive value. In the Lego- 
case this value is half of the size of the smallest bin, and in the L2-case it is jU^, the smallest 
positive value of z at which G(z] in the limit of an infinite number of random points N has a 
singularity. Although the instantons do not threaten the perturbation expansion, they cause G(z) 
to be undefined for asymptotically large N when the real part of z is larger then the mentioned 
values. 

For the analyses in the L2-case, a numerical method to investigate the Riemann sheet structure 
of the solution of certain algebraic complex equations is used, which is treated in Section 4.4. 
The method is in particular suitable for the determination of the series expansions around the 
origin on the different sheets and the asymptotic behavior of their coefficients. The results of 
the numerical analyses have been justified by the fact that only finite computer accuracy was 
required in the specific calculations. 



4.6 Appendices 

Appendix 4A: Matrices of the form An,m = CLn^n.m + eb^bra 

The eigenvalues A of a real- valued matrix A are given by the zeros of the characteristic polyno- 
mial Pa- If a is an M X M matrix with matrix elements 

An,m = a^Snm + £ b^b^^ , Q^, b^^ G R , u = 1 , . . . , M , e = ±1 , (4.80) 



bf 



then the characteristic polynomial Pa is given by 

M 

Pa(x) = QAMYlia^-x] , Qa(x) = l+e_^-^ , (4.81) 

n=1 m^l ^ 

which is easily derived using Pa(x) = Y-tz&Sm ^n=^ l-^nMn) ~ x6n,7t(n]]> where the sum is over 
all permutations of (1 , . . . , M). Without loss of generality, we assume that the coefficients an 
are ordered such that ai < ai < • • • < a^.. If a number of d^ coefficients take the same 
value, that is, if is d^-fold degenerate, then a (d^ — 1 )-fold degenerate eigenvalue of A is 
given by A = an- The remaining eigenvalues are given by the zeros of the function Q a- Except 
of the poles at x = a^^, n = 1 , . . . , M, this function is continuous and differentiable on the 
whole of R. Furthermore, the sign of the derivative is equal to e. This means that for each zero 
A of Qa except one, there is an n, such that < A < a-n+m for the nearest and non-equal 
neighbor an+m of a-n- The one other zero is smaller than ai if e = — 1, and larger than if 
e = 1 . This is easy to see because limx-^oo Qa(x) = limx-^-oo Q a(x) = 1 . 
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Appendix 4B: Derivation of Eq.(4.45) 

We use the definitions of T(E) as the r.h.s. of Eq. (4.33) and Ti (E) as given in Eq. (4.43): 



T(E) = 



d(|) 



, Ti(E) = 



, U((tj) = e* - (jj - 1 . (4.82) 



Because the end points (])+ and cj) depend on E such that E — U(4'±) = 0, we can use Leibnitz's 
rule for differentiation under the integral sign to write 



4>+ 



T(E) = 2 



dE 



VE-U((l))d(l) 



(4.83) 



Now we write ^yE-U[(^) = (E - e* + cj^ + 1 ](E - U{(^])-^/^ and use that 1 - e* = dU/dcjj, 
so that 



T(E) =2—1 ET(E)+Ti(E] 



1 dU 



VE-U((|)) d(t) 



(4.84) 



But the last integral is equal to zero, and as a result, we obtain Eq. (4.45). 



Chapter 5 

Gaussian limits for discrepancies 



This chapter deals with the calculation of the generating function of the probability densities of 
quadratic discrepancies in the limit of a large number of truly random points. These densities 
depend on the dimension s of the integration region, or, in the case of the Lego discrepancy, on 
the number of bins M. the integration region is dissected in. We will derive a 'Law of Large 
Number of Modes', which describes the conditions under which these densities approaches a 
normal density if s or M. become large. Throughout this discussion, we shall only consider the 
asymptotic limit of a very large number of random points. This implies that, in this chapter, we 
cannot make any statements on how the number of points has to approach infinity with respect 
to s or M, as was for instance done in [26]. 
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5.1 The generating function 

We want to calculate the the probability density H of quadratic discrepancies in the limit of 
N — > oo, where N is the number of points in the point set. We will use the generating function 
in this limit, that we denote by 

Go(z) := lim E(e"°^ ) , (5.1) 

N— >oo 

SO that the probability density is given by the inverse Laplace transform of Go (Section 2.1.3). It 
results in the weak limit of H. Starting from Eq.(3.28), it is easy to see that Go is given by 



Go(z) 



exp(z((|)^)+z((l))2)d^[(l)] . (5.2) 



In the Landau gauge, the boundary conditions on the functions cf) that give a contribution are 
such that (4)) = (notice that Go contains the same gauge freedom as G). We can apply the 
formalism of Section 2.2.2, and conclude that log Go(z) is equal to the sum of the contributions 
of all possible connected diagrams consisting only of vertices with two legs. Consequently, they 
are of the form 

3 

and carry a symmetry factor equal to 1 /2p, where p is the number of vertices. Every vertex 
contributes with a factor 2z, and represents a convolution of reduced two-point functions !B, so 
that 

logGo(z] = _£^Rp , (5.4) 
p=i 



with 



S(xi,X2)S(x2,X3) • • •B(xp_i,Xp)!B(xp,xi) dxidx2- • • dXp_idXp . (5.5) 

KV 



The coefficients Rp can be written in terms of the eigenfunctions Un and the eigenvalues of 
the two-point function C interpreted as an integration kernel (Section 3.2.5). We can use the 
expression of Eq. (3.57) for !B, which tells us that we have to repeatedly calculate the integral 



0-n(Uu(x) - (ltu))aTa(u^(x) - (Urn)) dx = CT^Sn.m - O-u(ltn) ffm(Um) • (5.6) 



K 



r is an infinite dimensional matrix, and the coefficients Rp can be written in terms of P through 



Rp = Tr(n , 



(5.7) 



5.1 The generating function 



73 



where denotes the p-fold matrix product, and Tr the trace, i.e., the sum over the diagonal 
elements. The generating function itself can also be expressed directly in terms of V, since 

logGo(z) = i_^Tr(P) = Tr||f_i^j = -Tr( I log(l - IzF) ) , (5.8) 
so that 

Go(z) = (det(l-2zr))-^/^ , (5.9) 

where we used the well known rule that, for a general matrix A, det(e'^) = e^''^'^'. In Ap- 
pendix 5 A, it is shown how Go can be written in terms of the strengths and the weights (vItl)^ 
with the result that 

Go(z) = ^== , (5.10) 

where 

il.(z) = -^|log(l-2zff^) , and ^[z] = ^ + ^ j^^^ . (5.11) 

n n ^ 

Notice that if the basis is in the Landau gauge, then (un) = for all functions and x(z) = 1 , 
so that the generating function is just given by — 2zct^)^^/^. In the Landau gauge, the 

matrix V is diagonal, so that this result follows directly from Eq.(5.9). If the eigenvalues of F, in 
a general gauge, are denoted Art, then the generating function is given by Yln^ ^ ~ ^zA^)^^'^. 

5.1.1 Standardized variables and the Gaussian limit 

We have now derived the expression for Go(z] in the large-N limit. Given the form of F, we 
can now compute H(t) for given discrepancy t, if only numerically; in fact this was done for 
the L2-discrepancy in [22] for several dimensionalities. In some special cases, H(t) can even be 
given as an analytic expression [23, 24]. Here, however, we are interested in possible Gaussian 
limits, and therefore it is useful use the standardized variable instead of the discrepancy itself 
(Section 2.1.2). Notice that the expectation value and the variance of the discrepancy are just 
given by 

E(Dn)=Ri and V(Dn] ^ 2R2 . (5.12) 



Go(z) = expUz^ + ^^^-^VT^ , with Yp := ^ • (5.13) 



The generating function G of the standardized variable is given by 

-2^VT,j , with 

All information on the particulars of the discrepancy are now contained in the constants y^, 
and we have that the standardized probability density approaches the normal density whenever 
Yp — > Ofor all p > 3. It remains to examine under what circumstances this can happen. 
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5.1.2 A Law of Large Number of Modes 

Since we know that Go(z) has no singularities for negative values of Re z, the eigenvalues of V 
are also nonnegative, and we may write 




Tr(r'^)=^A]t , y^.= []iK.] , K>0 , (5.14) 



where the various eigenvalues have been denoted by Ar^. Note that the sum may run over a finite 
or an infinite number of eigenvalues, but all these sums must converge since E(Dn) is finite. 
Note, moreover, that yic is homogeneous of degree zero in the Art: therefore, any scaling of the 
eigenvalues by a constant does not influence the possible Gaussian limit (although it will, of 

course, affect the mean and variance of Dm). 

We now proceed by noting that y^+i < yic, because 

Z^u"^) ^ (L^^) (Z^n) < (^L^n) (Z^n) . (5.15) 

where the first inequality is simply the Schwarz inequality, and the second one holds because the 
An are nonnegative. This means that y-f^ will approach zero for k > 3, whenever approaches 
zero. To see when this happens we define 

Pn := .^"^ and ps:=suppT^ , (5.16) 

V2_mAm 

so that Y.n Pn = 1 • It is then trivial to see that 

p! < jY^ < Ps , (5.17) 

from which we derive that the necessary and sufficient condition for the discrepancy distribution, 
in the limit of an infinite number of points in the point set, to approach a Gaussian is that 

, As := sup An . (5.18) 



The Gaussian limit is thus seen to be equivalent to the statement that even the largest eigenvalue 
becomes unimportant. 

Clearly, a necessary condition for this is that the total number of non-vanishing eigenvalues 
(number of modes) approaches infinity. Incidentally, the condition (5.18) also implies that 



, ^A^ , (5.19) 



for all those discrepancies that have E(DN)=^^An = l. This is eminently reasonable, since 
a distribution centered around 1 and (by construction) vanishing for negative argument can only 
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approach a nomial distribution if its variance approaches zero. On the other hand, the condition 
As is by itself not sufficient, as proven by a counterexample given in Appendix 5B. 

Another piece of insight can be obtained if we allow the eigenvalues to take on random 
values. We may introduce the rather dizzying concept of an ensemble of different definitions of 
discrepancy, each characterized by its set of eigenvalues (all nonnegative) A = {Ai , A2, . . . , Am}, 
with the usual constraint that they add up to 1; we keep M finite for simplicity. A natural 

— * 

probability measure on this ensemble is given by the probability density Pa(A) of the random 
vector A: 

Pa(A) := r(M)6|^j;_A^-lj . (5.20) 

Here V denotes Euler's gamma-function and 5 stands for the Dirac delta-distribution. It is easily 
computed that the expectation and variance of Ri^ = A]^ are given, for large M, by 

tlKkJ-j^^ ' V(KicJ ^^^^ , (5.21) 

so that the become sharply peaked around their expectation for large M. In that case, we have 

T3 ^ 5^ , (5.22) 

and we see that, in the above sense, almost all discrepancies have a Gaussian distribution in the 
limit where M, the number of modes, approaches infinity. 



5.2 Applications to different examples 
5.2.1 Fastest approach to a Gaussian limit 

We now examine the various definitions of discrepancies, and assert their approach to a Gaussian 
limit. Usually this is envisaged, for instance in [26], as the limit where the dimensionality of the 
integration region becomes very large. But, as we have shown, this is only a special case of the 
more general situation where the number of relevant modes becomes very large: another possi- 
ble case is that where, in one dimension, the number of modes with essentially equal strength (Jn 
becomes very large. As an illustration, consider the case where the basis functions with the Gaus- 
sian measure are orthonormal and M. of the nontrivial modes have equal strength = 1 /M, 
and the rest have strength zero. The moment-generating function then takes on a particularly 
simple form, and so does the discrepancy distribution [24]: 

,„,0„U,. --10.(1--) , HW.i^t-e-. (5.23, 

It is easily seen that the gamma-distribution H(t) approaches a normal one when M. becomes 
very large. At the same time, we see the 'physical' reason behind this: it is the fact that the 
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singularity of Go(z] in the complex plane (in the more general case, the singularity nearest 
to z = 0) moves away to infinity. One observation is relevant here: in the inverse Laplace 
transform, to go from Go to H, we have kept the integration along the imaginary axis Rez = 0. 
We might consider performing a saddle-point integration, with a non- vanishing value of Re z. 
That may give us, for a finite number of modes, a good approximation to the actual form of 
H(t). It is quite possible, and, indeed, it happens in the above equal-strength model, that this 
approximation is already quite similar to a Gaussian. In the equal-strength model, a saddle-point 
approximation for H(t) gives precisely the form of Eq. (5.23), the only difference being that 
r(M./2) is replaced by its Stirling approximation. On the other hand, for not-so-large M, this 
form is not too well approximated by a Gaussian centered around t = 1 , since the true maximum 
resides at t = 1 — 2/M. Nevertheless, in this chapter we are only interested in the limiting 
behavior of H(t), and we shall stick to the use of condition (5.18) as an indicator of the Gaussian 
limit. 

One interesting remaining observation is the following. For any finite number M of eigen- 
values An (n = 1 , 2, . . . , M.), the smallest value of the indicator A^/ Y.n^n obtained when 
Atl = 1 /M. for all n. In this sense, the equal-strengths model gives, for finite M, that discrepancy 
distribution that is closest to a Gaussian. 

5.2.2 The L2-discrepancy 

Here we shall discuss the standard L2-discrepancy (Section 3.3.1). The eigenfunctions Ua are 
equal to 2*/^ 0^=1 ^^s ((rtv + ^)7rx^) so that (tLfi) = l^^'^ua where the strengths, and the matrix 
r, are given by 



The components of the integer vector ft can take all non-negative integer values, including 
zero. The eigenvalue equation for the eigenvalues A of P can be written down easily: 



The strengths ffft are degenerate in the values they take. Labeling the strengths with different 
values by with p = 0^=1 + 1 ), the degeneracy is given by 





(5.24) 




(5.25) 




(5.26) 



We introduced the logical step function here, which is simply defined by 




(5.27) 
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So A = ffp is a solution to the eigenvalue equation with a (Qw(p] — 1 )-fold degeneracy. If we 



factorize these solutions we obtain the following equation for the remaining eigenvalues: 

l-r^Qw(p]^^ = . (5.28) 



Some assertions concerning the remaining eigenvalues can be made using this equation. On 
inspection, it can be seen that there are no negative solutions, nor solutions larger than erf, so 
that (jf can be used as an upper bound of the eigenvalues of V. If we order the A such that 
Ai > A3 > . . . , then erf > A, > ff| > A3 > . . . . This implies that Tr(r'"] = Qw(p) (jf" - e 
where < e < Now we have 

LQw(p]< = Ul^cr , UV) = LtI^^^ - (5.29) 

P ^ Tl>0 

and therefore, for k > 3, that 



The second factor decreases monotonically from (IS)"^ for s = 1 to one as s — > 00; for the first 
factor, we note that 1 < £,(2k) < £,(4) for all k > 2. Therefore yi^ can be made arbitrarily 
small by choosing s large enough, and the Gaussian limit of high dimensionality is proven. 
Note, however, that the approach is not particularly fast: for large s, we have 73 ~ (24/25)* ~ 
exp(— s/25), so that s has to become of the order of one hundred or so to make the Gaussian 
behavior manifest. In fact, this was already noted by explicit numerical computation in [22]. 

5.2.3 The Fourier diaphony 

In the case of the Fourier diaphony (Section 3.3.3), the eigenfunctions are in the Landau gauge 
by definition, so that the matrix V is just given by 

riTi,ft — OftSm,fL ) (5.31) 

with the strengths (Xa as in Eq.(3.74). The normalization of the strengths ensures that E(Dn) = 
1 , independent of s. In this case, keeping in mind that sines and cosines occur in the eigenfunc- 
tions with equal strength, we have to consider the multiplicity function 

QViv) ■•= ]ie(v = YlriK{n)]] , (5.32) 

n>0 \ / 

Actually, before assigning a strength an, or rather a^, we have to know the behavior of Q]} (p) in 
order to ensure convergence of E(Dn). In order to do so, we introduce the Dirichlet generating 
function for QP (p): 

T':\x) := Ji^^ = [^+2ax]r , (5.33) 
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where we use the Riemann C, function. Since this function (and, therefore, Fs^(x) as well), 
converges for all x > 1 , we are ensured that QP (p) exceeds the value cp^^*^ at most for a finite 
number of values of p, for all positive c and e. This is proven in Appendix 5C. It is therefore 
sufficient that 0"p decreases as a power (larger than 1) of p. In fact, taking 

= cp-P , |3 > 1 , (5.34) 

we immediately have that 

Ric = ^^f = XQ"(p)< = +2ak|3)]^-l] , (5.35) 

n>0 p>0 

which, for given (3, fixes c such that Ri = E(Disi) = 1 , and, moreover, gives 

y3~a(|3)^ as s ^ oo , a(|3) = C +^^(3(3)) 

(l+2a2|3))-' 

In Section 3.3.3, the value |3 = 2 is used, with a(2) ~ 0.291. The supremum of a(|3] equals 

1 /3, as (3 ^ 00, and the (more interesting) infimum is a(l ), about 0.147. We conclude that, for 
all diaphonies of the above type, the Gaussian limit appears for high dimensionality. For large 
(3, where the higher modes are greatly suppressed, the convergence is slowest, in accordance 
with the observation that the 'equal-strength' model gives the fastest convergence; however, the 
convergence is still much faster than for the L2-discrepancy, and the Gaussian approximation is 
already quite good for s ~ 4. Tha fastest approach to the Gaussian limit occurs when we force all 
modes to have as equal a strength as is possible within the constraints on the |3. The difference 
between the supremum and infimum of a(|3) is, however, not much more than a factor of 2. 

Another possibility would be to let depend exponentially on p. In that way one can 
ensure convergence of the Rk while at the same time enhancing as many low-frequency modes 
as possible. It is proven in Appendix 5C that the function 

F?^W := XQP^P)^" (5.37) 

p>0 

has radius of convergence equal to one, and therefore we may take = (|3')^ with |3' between 
zero and one. If we choose (3' to be very small, we essentially keep only the modes with p = 1 , 
and therefore in that case we have 73 ~ 1/(3* — 1 ). This is of course in reality the same type 
of discrepancy as the above one, with |3 — > 00. On the other hand, taking (3' — > 1 we arrive 
at 73 ^ (see, again. Appendix 5C). The difference with the first model is, then, that we 
can approach the Gaussian limit arbitrarily fast, at the price, of course, of having a function 

2 (xic, Xi,] that is indistinguishable from a Dirac 6-distribution in xic — X;, and hence meaningless 
for practical purposes. 

5.2.3.1 Fourier diaphony with sum clustering 

In the above, we have let the strength o"ft depend on the product of the various r(ruv). This 
can be seen as mainly a matter of expediency, since the generalization to s > 1 is quite simple 
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in that case. From a more 'physical' point of view, however, this grouping of the a is not so 
attractive, if we keep in mind that each n corresponds to a mode with wave vector k(fL). Under 
the product rule, wave vectors differing only in their direction but with equal length may acquire 
vastly different weights: for instance, k = (m^/s, 0,0,...) and k = (m, m, m, . . . ] have equal 
Euclidean length, m^/s, but their strengths under the product rule are 1/(sm^) and 1/(m^^), 
respectively. This lack of 'rotational' symmetry could be viewed as a drawback in a discrepancy 
distinguished by its nice 'translational' symmetry. One may attempt to soften this problem by 
grouping the strengths (Ja in another way, for instance by taking 



(Til 



^k(n,] 



(5.38) 



so that a depends on the sum of the components rather than on their product. The multiplicity of 
a given strength now becomes, in fact, somewhat simpler: 



s 

m 



n>0 \ ^=1 / Tri>0 

where the last identity follows from the generating function 



1 +p - 
p — ra 



m 



Ff^M := ^Qp^(p)x^ 

p>0 



1 +x 

1 -X 



This also immediately suggests the most natural form for the strength: (j^ = 
k(rtv) as above. We see that Ri converges as long as |3 < 1 , and moreover, 

2 

1 ■ 



73 = 



1 



am 



(5.39) 

(5.40) 
pi', where p is 

(5.41) 



where a(|3] has supremum a(0) = 1, and decreases monotonically with increasing (3. For (3 
close to one, we have a(|3) ~4(1 — (3]/9,so that the Gaussian limit can be reached as quickly as 
desired (again with the reservations mentioned above). At the other extreme, note that for very 
small |3 we shall have 

^ s|3^<l . (5.42) 



2s 



if 



This just reflects the fact that, for extremely small |3, only the 2s lowest nontrivial modes con- 
tribute to the discrepancy; and even in that case the Gaussian limit is attained, although much 
more slowly. The criterion that determines whether the behavior of 73 with s and |3 is exponential 
or of type 1 /(2s) is seen to be whether s(3^ is considered to be large or small, respectively. 

Another alternative might be a power-law-like behavior of the strengths, such as = 1 /p"'. 
Also in this case we may compute the Rk, as follows: 



1 



p>0 



1 

f(ka) . 



(5.43) 
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from which it follows that a > s to ensure convergence of E(Dn). In the large-s limit, we 
therefore find that, also in this case, 73 —> 1 /(2s). 

5.2.3.2 Fourier diaphony with spherical clustering 

A clustering choice which is, at least in principle, even more attractive from the symmetry point 
of view than sum clustering, is to let ffft depend on |k(fL) hence assuring the maximum possible 
amount of rotational invariance under the constraint of translational invariance. We therefore 
consider the choice 

4 = exp ^-a^k(n^)^j . (5.44) 

For the function S(xi,X2) = 3(xi — Xz] we now have the following two alternative forms, 
related by Poisson summation: 

s / +00 

3(x) = -1 +n Y. cos(27rkx^) 

-v=^ \ic=— 00 / 

= -' + (j L^^P^ S ) • (5.45) 

fa 

of which the first converges well for large, and the second for small, values of a; the sum over 
ra extends over the whole integer lattice. The are, similarly, given by 

/ +00 \ s 

Ric = X - ^ 

\q=-cx) / 

\ m=— 00 / 

For large a (where, again, only the first few modes really contribute) we recover, again, the limit 
^ ^/ (2s) as s — > 00: for small a we have, again, an exponential approach to the Gaussian 
limit: 

73 - ( ^ 1 as s ^ 00 . (5.47) 

The distinction between the two limiting behaviors is now the magnitude of the quantity se^^'*, 
which now takes over the role of the s|3^ of the previous paragraph. 

5.2.4 The Walsh diaphony 

Another type of diaphony is based on Walsh functions, which are defined as follows. Let, in one 
dimension, the real number x be given by the decomposition 

X = 2-^xi + 2-^2 + 2-^3 + • • • , Xi e {0, 1 } , (5.48) 
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and let the nonnegative integer n be given by the decomposition 

TL = TLi + Iriz + 2^n3 + 2^1x4 H , Ui e {0, 1 } 

Then, the n* Walsh function Wti(x) is defined as 



(5.49) 



(5.50) 



The extension to the multidimensional case is of course straightforward, and it is easily seen that 
the Walsh functions form an orthonormal set. The Walsh diaphony is then given by 



D 



N 



n>0 



N 



(5.51) 



In [20], the following choice is made: 



r(n) := e(n = 0) + e(n > 0) ^2^9 (2^ < n < 2^+^) 

p>0 



(5.52) 



Note that, in contrast to the Fourier case where each mode of frequency n contains two basis 
functions (one sine and one cosine), the natural requirement of 'translational invariance' in this 
case requires that the Walsh functions from 2^ up to 2^+^ get equal strength. The clusterings are 
therefore quite different from the Fourier case. We slightly generalize the notions of [26], and 
write 



4 = nw 



1 



r(n] = d[n^O) + e[n>0)Y_ ioc^T^^^ 9 (2^ < n < 2^+^) 

p>0 



(5.53) 



Here, we have disregarded the overall normalization of the cr's since it does not influence the 
Gaussian limit. It is an easy matter to compute the Ric; we find 



n.>0 



1 -2|3^ 



1 , 



(5.54) 



so that the requirement E(Dn) = Ri < cxd implies that we must have |3 < 1/2. Therefore, for 
not too small values of cx, we have 



(1 +aV(1 -2(33))^ 
(1 +aV(l -2(32))3 



73 ~ a(cx,(3]' 



a(a, |3) = 



(5.55) 



The choice made in [20] corresponds to a = 1 and |3 = 1/4, for which we find a(l , 1/4) ~ 
0.41 97. The Gaussian limit should, therefore, be a good approximation for s larger than 6 or so. 
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An interesting observation is that for fixed (3, a(a, |3] attains a minimum at(x=(1— 2|3^)/(1 — 
2|3^), so tliat tlie clioice (3 = 1/4 could in principle lead to a (3 1/28, 1/4) = 0.4165 with a 
marginally faster approach to the Gaussian. The overall infimum is seen to be a(3/2, 1 /2) = 
2/11 ~ 0.1 82. As in the Fourier case with product clustering and a power- law strength, there is 
a limit on the speed with which the Gaussian is approached: in both cases this is directly related 
to the type of clustering. 

At the other extreme, for very small ex we find the limiting behavior 

Again in this case, the slowest possible approach to the Gaussian limit is like 1 / s, directly related 
to the symmetry of the discrepancy definition with respect to the various coordinate axes. 



5.2.5 The Lego discrepancy 

In the case of the Lego discrepancy (Section 3.3.4), the matrix Vm,n has indices that label the 
bins Art (n = 1,2,... M.) the hypercube is dissected into, where M. is the total number of 
bins. Because the characteristic functions of the bins are not normalized, the matrix looks a bit 
different: 

rm,u = O-^^an (Wiu5iTi,n - W„^Wn) , (5.57) 

where := (-Sn) is the volume of bin A^. This matrix satisfies Tr(r^) = Rp for all p > 0. 
We shall now examine under what circumstances the criterion (5.18) for the appearance of the 
Gaussian limit is fulfilled. The eigenvalues Ai of the matrix rTa,n are given as the roots of the 
eigenvalue equation 

M \ / > \ 

It is seen that there is always one zero eigenvalue (the corresponding eigenvector has 1 / a^a for 
its m*'' component). Furthermore the eigenvalues are bounded by max^alcr^WTa), and this bound 
is an eigenvalue if there is more than one m for which the maximum is attained. At any rate, we 
have for our criterion, that 



Since the generality of the Lego discrepancy allows us to choose from a multitude of possibilities 
for the cr's and w's, we now concentrate on a few special cases. 

1. All Wm equal. This models integrands whose local details are not resolved within areas 
smaller than 1 /M, but whose magnitude may fluctuate. In that case, we have 

o ^ 1 (max^g^)^ 
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and a sufficient condition for the Gaussian limit is for this bound to approach zero. Note 
that here, as in the general case, only bins m with (Jm. / contribute to the discrepancy as 
well as to the criterion ps, so that one has to be careful with models in which the integrand 
is fixed at zero in a large part of the integration region: this type of model was, for instance, 
examined in [21]. 

2. All o"ra equal. In this case, the underlying integrands have more or less bounded magnitude, 
but show finer detail in some places (with small w) than in other places (with larger w). 
Now, it is simple to prove that 

Ps < ^- , w:=maxwTn , (5.61) 

1 — 2w + 1 /M m 

so that a sufficient condition is that Mw^ should approach zero. 

3. All (y^Wra equal. In this case, the discrepancy is the x^-statistic for the data points dis- 
tributed over the bins with expected fraction of points w^. We simply have 

= (M + 2)(M-1) ' ^^-^^^ 
and the Gaussian limit follows whenever M — > oo. 



5.3 Conclusions 

We derived the probability distribution, in the limit of a large number of points, over the ensemble 
of truly random point-sets of quadratic discrepancies. We have shown under what conditions 
this distribution tends to a Gaussian. In particular, the question of the limiting behavior of a 
given distribution can be reduced to solving an eigenvalue problem. Using the knowledge of the 
eigenvalues for a given function class it is possible to determine under which conditions and how 
fast the Gaussian limit is approached. Finally, we have investigated the limiting behavior of the 
probability distribution for the discrepancy of several function classes explicitly. 

The discrepancy that fastest approaches the Gaussian limit is obtained for the model in which 
the number of modes with non-zero equal strength goes to infinity, while the sum of the strengths 
is fixed. In fact, we give an argument why we cannot improve much on this limit. However, a 
drawback of this model is that the discrepancy itself becomes a sum of Dirac 6-distributions in 
this limit: it only measures whether points coalesce, and is therefore not very useful in practice. 

Secondly we looked at the L2-discrepancy. Here a Gaussian distribution appears in the limit 
of a large number of dimensions. It is however a very slow limit: only when the number of 
dimensions becomes of the order O (10^) does the Gaussian behavior become manifest. 

For the different diaphonies the choice of the mode- strengths is more arbitrary. The strengths 
we discuss are chosen on the basis of some preferred global properties of the diaphony, such 
as translation- and/or rotation-invariance. Again for large dimensions the Gaussian limit is at- 
tained, either as a power-law or inverse of the number of dimension. It is possible to choose the 
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strengths in such a way that the Gaussian limit is approached arbitrarily fast. But the diaphony 
corresponding to that case again consists of a sum of Dirac 5-distributions. 

Finally, for the Lego-discrepancy, we can assign strengths to the different modes in several 
ways. One example is to keep the product of the squared strength and volume of the modes fixed, 
then the Gaussian limit is reached for a large number of modes. 

All these results have been derived in the limit of large number of points. It remains to be seen 
however whether this is reasonable in practice. To determine when the asymptotic regime sets in, 
i.e. for which value of N, it is necessary to take into account the next-to-leading contributions, 
which will be calculated in the following chapter. 



5.4 Appendices 

Appendix 5A: The form of Go (z) 

In this Appendix, we derive the result (5.10) for the form of Go(z). We introduce the notation 
[BA'^B] := X.Tan^TTi.(A'')m,,nBn for matrices A and vectors B, and the general form of the 
matrix P: 

^m,n = ^m,n ~ ^rrfin • (5.63) 

The k*** power of this matrix has the general form 

P,q,^o,i,2,.,.>0^°'^^'^^' " ' r>0 

with the constraint k— 1 = p + q+ "Vo + 2"Vi+ 3'V2 + • • • • The combinatorial factor follows 
directly from the possible positionings of the dyadic factors — B^^Bti. Multiplying by {2t)^~^ 
and summing over the k then gives us immediately 

/ r \ V 1 V r,>o('r+1)(2t)nBA^B] 

^' [t^) = i:(2t)-'T,A'', - . (5.65) 

where the factor with r + 1 comes from the double sum over p and q with p + q = r. Upon 
integration of this result over t from to z we find 

logGo(z) = ^i^Tr(n 

n>0 

= Y. ^'^'■^^"^ - ^ log M + ^(2z)-[BA-^B] j . (5.66) 

n>0 \ n>0 / 

If we now take ^■m-.n — O'n^Ta.n 

and Bn = (Tn(u n), we obtain (5.10) with (5. 1 1). This result has, 
in fact, already been obtained for the case of the L2-discrepancy in [22], but here we demonstrate 
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its general validity for more general discrepancy measures. In those cases where = 0, the 
second term vanishes of course. 



Appendix 5B: A counterexample 

In this Appendix we prove that the condition (5.18) for the occurrence of a Gaussian limit is, in 
a sense, the best possible. Namely, consider a set of eigenvalues A^, again adding up to unity as 
usual, defined as follows: let A be a positive number, and take 

r(i_A)/(M-l) forn = 2,3,...,M, 
Ai = A , An = < (5.67) 
for n > M . 



Clearly, A will indeed be the maximal eigenvalue as long as M. > 1 /A. Now, 

A^ 



(5.68) 



and this ratio can be driven as close to unity as desired by choosing M sufficiently large. This 
shows that the simple condition A — > is not always enough to ensure the Gaussian limit. 

Appendix 5C: The magnitude of Qn (p) 

Here we present the proofs of our various statements about the multiplicity function Q]} (p) of 
Eq. (5.37). In the first place, we know that its Dirichlet generating function, Ff^^(x), converges 
for all X > 1 . Now suppose that QP (p) exceeded cp"^ an infinite number of times, with c > 
and (X > 1 . The Dirichlet generating function would then contain an infinite number of terms all 
larger than c, for 1 < x < a, and therefore would diverge, in contradiction with its convergence 
for all X > 1 . 

In the second place, consider the 'standard' generating function, Fs (^)- By inspecting how 
many of the vector components of ft are zero, we see that we may write, for p > 1 , 

QViv) = ii(l)2'd,[v) , d,{v) := }^e(v = YlnJ , (5.69) 

t=1 ^ ^ n>0 \ ^'=1 / 

SO that dt(p) counts in how many ways the integer p can be written as a product of t factors, 
including ones; this function is discussed, for instance, in [14]. Now, for p prime, we have 
dt(p) = t, and therefore 

QP(p) > 2s (3'-^) , equality for p prime. (5.70) 

[21 

The radius of convergence of Fs (x) is therefore at most equal to unity. On the other hand, we 
can obtain a very crude, but sufficient, upper bound on Qf (p) as follows. Since dt(p) is a 
nondecreasing function of t, we may bound QP (p ) by (3* — 1 ) dj (p ] . Now let kp be the number 
of prime factors in p; then kp cannot exceed log(p)/ log(2), and only is equal to this when p is 
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a pure power of 2. Also, the number of ways to distribute k object in s groups (which may be 
empty) is at most s^, and is smaller if some of the objects are equal. Therefore, ds(p) is at most 
s'^p, and we see that 



or, in short, is bounded^ by a polynomial in p. Therefore, the radius of convergence of Fj (x) is 
also at least unity, and we have proven the assertion in Section 5.2.3. 
Finally, we consider the limit 



The same reasoning that led us to the radius of convergence shows that, for x approaching 1 
from below, the function Fg (x) behaves as (1 — x] with c > 1 . Therefore, 73 will behave as 
(8(1 — x)/9Y, and approach zero as x — > 1 . Note that the upper bound on QP (p) is extremely 
loose: but it is enough. 



Note that equality cannot occur in this case since the two requirements are mutually exclusive. 



QP(p) < (3S_l)plog(s)/log(2) ^ 



(5.71) 




(5.72) 



Chapter 6 

Finite-sample corrections to discrepancy 
distributions 



This chapter deals with the calculation of the 1/1^ -corrections to the asymptotic probability 
distributions of quadratic discrepancies in the limit of an infinite number of random points N. In 
Section 6.1, the explicit diagrammatic expansion of the logarithm of the generating function up 
to and including 0(1 /N'^) will be given. For the Lego discrepancy, the L2-discrepancy in one 
dimension and the Fourier diaphony in one dimension, the explicit 1 /N -correction is calculated. 

In Chapter 5, criteria were given for the asymptotic probability distribution of several quad- 
ratic discrepancies to become Gaussian when a certain free parameter becomes infinitely large. 
This parameter often is the dimension s of the integration region. In the case of the Lego discrep- 
ancy, it is the number of bins M. In [26], it is shown that for the Fourier diaphony a Gaussian 
limit is obtained when both N and s go to infinity such that c*/N — > 0, where c is some constant 
larger than 1 . This theorem clearly gives more information about the behavior of the probability 
distribution, for it relates s and N, whereas in Chapter 5 the limit of N — > oo is assumed before 
considering the behavior with respect to s or M. However, the techniques of this chapter to 
calculate 1 /N -corrections to the asymptotic distributions give the opportunity to relate s or M 
with N. In Section 6.2, this leads to limits for the Lego discrepancy, which is equivalent with 
a x^-statistic for N data points distributed over M bins, if M as well as N become infinite. In 
Section 6.3, a Gaussian limit is derived for the Fourier diaphony, which is stronger than the one 
in [26] in the sense that it provides convergence of the moments of the distribution, whereas the 
limit in [26] is weak. 
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6.1 The first few orders 



The Feynman diagrams that contribute to the first few orders in the 1 /N -expansion of the gen- 
erating function G(z) of the probability distribution of quadratic discrepancies are determined, 
and are used in a few examples. 



6.1.1 The diagrammatic expansion 

To calculate a term in the 1 /N -expansion of G, the contribution of all diagrams that can be drawn 
using the Feynman rules, as given in Section 3.2.4, and carry the right power of 1 /N has to be 
included. We want to stress again that we only need to calculate the connected diagrams without 
external lines. The sum of the contributions of all these diagrams gives 

W(z) := logG(z) = Wo(z) + ^Wi(z) + ^W2(z) + --- . (6.1) 

Usually, a Feynman diagram is a mnemonic representing a certain contribution to a term in a 
series expansion, i.e. a label. We will use the same drawing for the contribution itself, apart of 
the symmetry factor of the diagram. For example, the contribution of the diagram ( ItQr is equal 
to !B (x, x) dx and its symmetry factor is equal to ^, so that we write 



1 - Ng2 



B(x,x)dx . (6.2) 

K 
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6.1.1.1 The zeroth order 

The contribution to the zeroth order in 1 /N can only come from diagrams in which the power 
of 1 /N coming from the vertices cancels the power of N, coming from the fermion loops. This 
only happens in diagrams with vertices with two bosonic legs only, and in which the fermion 
lines begin and end on the same vertex. To write down their contribution, we introduce the 
two-point functions "Bp, p = 1,2,..., defined by 



S 1 (xi , X2] := S (xi , X2) , Sp+i (xi , X2) 
The zeroth order term Wo(z] is given by 



Sp(xi,v,):B(y,X2)dv 



(6.3) 



K 



2 - 4 - 6 



+ 



L 

p=i 



2p 



K 



!Bp(x, x) dx 



(6.4) 



The factor 1 /2p is the symmetry factor of this type of diagram with p fermion "leaves". If we 
substitute g = ■\/2z/N in this expression, we find exactly the result of Eq.(5.4). 



6.1.1.2 The first order 

As we have seen before, bosonic two-point vertices with a closed single fermion line contribute 
with a factor 2z, and without any dependence on N. Therefore, it is useful to introduce the 
following effective vertex 



1 ... 

2'' 



:= P = N X convolution , 



(6.5) 



and the following dressed boson propagator 

X ij := X t) + X -. - y + x- - ^ -^ - y + x 

00 

Sz(x,-y) := ^(2z)^-iSp(x,-y) . 
p=i 

In terms of the basis in the Landau gauge, it is given by 

,1 



y + 



(6.6) 
(6.7) 



(6.8) 



which is, apart of a factor 2z, the same expression as in Eq. (67) in [23]. Notice that Sz and S 
satisfy the relation 



limgjx,v) = Sz=o(x,v) = :B(x,-y) Vx,v,eK 

z-^O 



(6.9) 
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Furthermore, notice that Sz and Wq satisfy 

a 



3z 



Wo(z] 



gz(x, x) dx , 



(6.10) 



K 



and that this relation determines Wq uniquely, because we know that Wo(0) has to be equal to 
in order for the asymptotic probability distribution to be normalized to 1 . 
The first order term in the expansion of W(z) is 



1 



N ' ' 8 
or, more explicitly, 

rr2 



(6.11) 



Wi(z] = -J 



Szlx.x) dx - — 

K ^ 



Sz(x,x) dx 



K 



K2 



2z3 



Szlx.xjSzlx.-yjSzly.-y) dxdxj + 

k2 5 



K2 



Sz(x,-y]^dxdy . (6.12) 



6.1.1.3 The second order 

The second order term in the expansion of W(z) is denoted ■jqTW2(z] and is given by 




1 

4 



+ 




+ 




+ 



4 




+ 



12 



+ 80000+16 




+ 3^ 



(6.13) 
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6.1.1.4 One-vertex decomposability 

For some discrepancies, the contribution of a bosonic part of a diagram that consists of two pieces 
connected by only one vertex, is equal to the product of the contribution of those pieces. Such 
diagrams we call one-vertex reducible, and discrepancies with this property we call one-vertex 
decomposable. Examples of such discrepancies are those for which 'B is translation invariant, 
i.e., S(x,'y) = ^(x + Qjij + a) YXjij , a G K, such as the Fourier diaphony. Also the Lego 
discrepancy with equal bins is one-vertex decomposable. In contrast, the L2-discrepancy is not 
one-vertex decomposable. 

As a result of the one-vertex decomposability, many diagrams cancel or give zero. For exam- 
ple, the first and the second diagram in (6. 11) cancel, and the fourth gives zero, so that 

J_W,(z) = + ^0 . (6.14) 

To second order, only the following remains: 

We now derive a general rule of diagram cancellation. First, we extend the notion of one-vertex 
reducibility to complete diagrams, including the fermionic part, with the rule that the two pieces 
both must contain a bosonic part. Consider the following diagram 




0<3) . (6.16) 



The only restriction we put one the "leave" A is that it must be one-vertex irreducible with respect 
to the vertex that connects it to the fermion loop. For the rest, it may be anything. We define the 
contribution of the leave by the contribution of the whole diagram divided by — N, and denote 
it with C(A). This contribution includes internal symmetry factors. Now consider a diagram 
consisting of a fermion loop as in diagram (6.16) with attached to the one vertex rii leaves of 
type Ai, T\2 leaves of type A2, and so on, up to rip leaves of type Ap. The extra symmetry factor 
of such a diagram is (riilni! • • np!) \ and, for one-vertex decomposable discrepancies, the 
contribution is equal to the product of the contributions of the leaves, so that the total contribution 
is given by 

-Nn^ . (6.17, 

Now we sum the contribution of all possible diagrams of this kind that can made with the p 
leaves, and denote the result by 

" C(A,^^° 







L n^'^^ = -^(e^p(LC(Aq])-l) . (6.18) 



m ,n2 ,...>! q=1 ■ q=1 
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Because the little square in l.h.s. of Eq. (6.18) represents all possible ways to put the leaves 
together onto one vertex, the sum of all possible ways to put the leaves onto one fermion loop is 
given by 



^ O ^ O + - = -^L^^(e^p(LC(A,))-,)" . (6.19) 

^ ~ u=1 q=1 



The (— 1 ]^ ^ in the sum comes from the vertices and 1 /n is the extra symmetry factor of such 
diagram with n squares. The sum can be evaluated further and is equal to 

p p 
-Nlog(exp(^C(Aq))) = -N}^C(Aq) , (6.20) 

q=1 q=1 

i.e., the sum of all possible ways to put p different leaves onto one fermion loop is equal to 
the sum of all leaves, each of them put onto its own fermion loop. This means that diagrams, 
consisting of two or more leaves put onto one fermion loop, cancel. 

Now consider the following equation, which holds for every one-vertex decomposable dis- 
crepancy: 



B )Va) , (6.21) 





where we only assume that B is not of the type on the l.h.s. of Eq. (6. 19). The minus sign comes 
from the fact that the first diagram has one vertex less. Because the number of fermion lines a 
fermion loop consists of is equal to the number of vertices it contains, we can always pair the 
diagrams into one diagram of the l.h.s. type and one of the r.h.s. type so that they cancel. We can 
summarize the result with the rule that 

for one- vertex decomposable discrepancies, 
only the one-vertex irreducible diagrams contribute. 

6.1.2 Applications 

We apply the general formulae given above to the Lego discrepancy, the L2-discrepancy in one 
dimension and the Fourier diaphony in one dimension. 

6.1.2.1 The Lego discrepancy 

We take the strengths 0"^ equal to 1 /^/nv, so that the discrepancy is just the x^-statistic that 
determines how well the points are distributed over the bins (Section 3.3.4). The propagator is 
given by 

= > 1 , (6.23) 
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and it is easy to see that !Bp(x, t|) = !B(x, y), p = 2, 3, ... , so that the dressed propagator is 
given by 

Sz(x,-y) = J^'^i'^^y) ■ (6-24) 

The zeroth order term can be found with the relation of Eq.(6.10), which results in the following 
expression 



Wo(z) = -^log(l -2z) 



S(x,x)dx = -M_!.iog(l -2z] , (6.25) 

K 



which is exactly the logarithm of the generating function of the x^-distribution (notice that this is 
by definition the distribution of the x^-statistic in the limit of an infinite number of random data). 
To write down the first order term, we introduce 

M2=r— , and Ti(z) = -^, (6.26) 

l-2z 

so that 

Wi(z) = ^ (M2-M2-2M + 2)ri(z)^ + -'- (5M2-3M^-6M + 4)ti(z)3 . (6.27) 
8 24 

If the bins are equal, so that = 1/M.n = 1,2,... ,M., then only the contribution of the 
diagrams of Eq. (6. 16) remains, and the result is 

Wi(z) = -lETi(z]2+l(E^-E)n(z)3 , (6.28) 

where we denote 

E = M-1 . (6.29) 
To second order in 1 /N> the contribution comes from the diagrams in Eq. (6. 15), and is given by 

W2(z) = (5E3-12E2 + 7E)^1§^ + (E^-eE^ + SE)^^^-*^ 



48 ^ '8 

+ (E^-28E^ + 43)^ + (-E^-SE)^ . (6.30) 

48 1 2 



In Appendix 6A, we present the expansion of G (z) in the case of equal bins, up to and including 
the 1 /N"^ term. It is calculated using the path integral expression (3.85) of G(z) and computer 
algebra. The reader may check that this expression for G(z) and the above terms of W(z) satisfy 
G(z) = e^(^^ up to the order of 1 /N^. 
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6.1.2.2 The L2-discrepancy 

In one dimension, the basis in the Landau gauge is given by the set of functions { cos (riTtx) , n = 
1,2,...} (Section 3.3. 1), so that the propagator is given by 

^ V- 2cos(n7tx) cosfnTty) . . , i it , 

:B(x,-y) = >_ ^ j-^ ^ = min(l --y) + ^x^+^v2-f . (6.31) 

The dressed propagator is given by 



c( .A V- 2cos(n7rx)cos(n7ry) 
Sz(x,-y = > ^ — (6.32) 

n.=1 
1 1 



: — {cos[u(l -|x + v|)] + cos[u(l -|x-v|)]} , (6.33) 



2u sin u 
with 

u = V2z . (6.34) 
The zeroth order term can be obtained using Eq.(6.10): 

W„(z) =-llog(!!^) , (6.35) 

which is the well-known result. After some algebra, also the first order term follows: 



Wi(z) = -l^r24-8^-7-^-7-^-2-^) . (6.36) 
288 V smu sin u tanu tan^u 



6.1.2.3 The Fourier diaphony 

We consider the one-dimensional case, with a slightly different definition than the one given in 
Section 3.3.3: we multiply the discrepancy with a factor 7t^/3, so that the propagator is given by 

^, ^ V- 2cOS(2rL7r{x — -y}) 7t^ t^-i r ^r.^ 

:B(x,-y) = Y_ — = y n -6{x--y}(l -{x-y})] , (6.37) 

where we use the notation {x} = x mod 1 . The dressed propagator is given by 

2cos(2n7t{x-y}) f vcos[v(2{x-y}- 1)]\ 

^— n^ — 2z V 2smv / 

n=1 ^ ' 

where 

v = V2j^ . (6.39) 
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This two-point function is, apart of a factor tt^/v^, the same as the one in Eq. (26) in [24]. The 
zeroth order term can easily be obtained from the dressed propagator and is given by 



which is in correspondence with Eq. (21) in [24]. Because the propagator is translation invariant, 
i.e., 'B[x + a,y + a) = !B(x,'y) Vx,y, a G K, the contributions of the first two diagrams in 
Eq. (6.11) cancel, and the contribution of the fourth diagram is zero. The contribution of the 
remaining diagrams gives 



6.2 Scaling limits for the Lego discrepancy 

In this section, we take a closer look at the Lego discrepancy in the case that it is equivalent with 
a x^-statistic for N data points distributed over M. bins (Section 3.3.4). First, we will show that 
the natural expansion parameter in the calculation of the moment generating function is M/N, 
and calculate a few terms. We will see, however, that a strict limit of M — > oo does not exist, 
and, in fact, this is well known because the x^-distribution, which gives the lowest order term 
in this expansion, does not exist if the number of degrees of freedom becomes infinite. We 
overcome this problem by going over to the standardized variable, which is obtained from the 
discrepancy by shifting and rescaling it such that it has zero expectation and unit variance. In 
fact, it is this variable for which the results in [26] and Chapter 5 were obtained. In this section, 
we derive similar results for the Lego discrepancy, depending on the behavior of the sizes of the 
bins if M. goes to infinity. We will see that various asymptotic probability distributions occur 
if M, N — > oo such that M. — > constant with a > 0. If, for example, the bins become 
asymptotically equal and (x> j, then the probability distribution becomes Gaussian. Notice that 
this includes limits with a < 1 , which is in stark contrast with the rule of thumb that, in order to 
trust the x^-distribution, each bin has to contain at least a few, say five (see e.g. [2]), data points. 
Our result states that, for large M and N, the majority of bins is allowed to remain empty! 

6.2.1 Sequences and notation 

In the following, we will investigate limits in which the number of bins M. goes to infinity. Note 
that for each value of M, we have to decide on the values of the volumes w^l of the bins. They 
clearly have to scale with M, because their sum has to be equal to one. There are, of course, 
many possible ways for the measures to scale, i.e., many double- sequences {wl^^\ 1 < Tt < 
M, M > 0} of positive numbers with 




(6.40) 




(6.41) 



M 



M 




VM > 



and 




n=1 



(6.42) 
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We, however, want to restrict ourselves to discrepancies in which the relative sizes of the bins 
stay of the same order, i.e., sequences for which 

inf Mw^^^ > and sup Mw^^^ < oo . (6.43) 



n,M 



It will appear to be appropriate to specify the sequences under consideration by another criterion, 
which is for example satisfied by the sequences mentioned above. It can be formulated in terms 



of the objects 

M 

Mp := ^K^])'"^ , P>1 , (6.44) 

n=l 

and is given by the demand that 

M-^oo M.T' 



Mr, 

hpe[1,oo] Vp > 1 , where hp := lim — ^ . (6.45) 

M-^oo MP 



Within the set of sequences we consider, there are those with for which the bins become asymp- 
totically equal, i.e., sequences with 

w^^J = with lim max|eif]|=0, (6.46) 

and e!^^^ > — 1 , 1 < n < M of course. They belong to the set of sequences with Hp = 1 Vp > 1 , 
which will allow for special asymptotic probability distributions. 

In the following analysis, we will consider functions of M and their behavior if M — > oo. To 
specify relative behaviors, we will use the symbols "x" and The first one is used as 
follows: 

fi(M)-f2(M) ^ lim = 1 . (6.47) 

M-^oo T2[Mj 

If a limit as above is not necessarily equal to one and not equal to zero, then we use the second 
symbol: 

fi(M)xf2(M) <^ fi(M)-cf2(M) , ce(0,oo). (6.48) 

We only use this symbol for those cases in which c ^ 0. For the cases in which c = we use 
the third symbol: 

fi(M) ^f2(M) ^ lim = . (6.49) 

M-^oo +21 Mj 

We will also use the 0-symbol, and do this in the usual sense. We can immediately use the 
symbols to specify the behavior of Mp with M, for the criterion of Eq. (6.45) tells us that 

MpxMT' , (6.50) 
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and that 



Mp~MT if Hp = l . 



(6.51) 



In our formulation, also the number of data points N runs with M. We will, however, never 
denote the dependence of N on M. explicitly and assume that it is clear from now on. Also the 
upper index at the measures Wn we will omit from now on. 

6.2.2 Feynman rules 

The Feynman rules to calculate the generating function G[z) in a diagrammatic expansion are 
given in Section 3.2.4. The boson propagator is a matrix in this case, i.e., 

boson propagator: n m = "Bn m = ^^'^ — 1 , (6.52) 

and boson propagators are convoluted as Y.m=^ Wm^m,Tn ^Ta,n.2 • ' ' ^Ta,n.p in the vertices. Only 
connected diagrams have to be calculated, since 



Furthermore, the bosonic part of each diagram decouples completely from the fermionic part, 
and the contribution of the fermionic part can easily be determined, for 



Because of the rather simple expression for the bosonic propagator, we are able to deduce from 

the basic Feynman rules some effective rules for the bosonic parts of the Feynman diagrams. 
Remember that the bosonic parts decouples completely from the fermionic parts. The following 
rules apply after having counted the number of fermion loops and the powers of g coming from 
the vertices, and after having calculated the symmetry factor of the original diagram. When we 
mention the contribution of a diagram in this section, we refer to the contribution apart from the 
powers of g and symmetry factors. This contribution will be represented by the same drawing as 
the diagram itself. 

The first rule is a consequence of the fact that 



log G (z) = the sum of the connected vacuum diagrams. 



rule 1 



every fermion loop only gives a factor — N . 



rule 2 



M 




(6.53) 



n=1 



and states that 



all vertices with only two legs that do not form a single loop can be removed. 



rule 3 



The second rule is a consequence of the fact that for any M x M-matrix f 




(6.54) 
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and states that the contribution of a diagram is the same as that of the diagram in which a boson 
line is contracted and the two vertices, connected to that line, are fused together to form one 
vertex, minus the contribution of the diagram in which the line is simply removed and the vertices 
replaced by vertices with one boson leg less. This rule can be depicted as follows 




rule 4 



By repeated application of these rules, we see that the contribution of a connected bosonic dia- 
gram is equal to the contribution of a sum of products of so called daisy diagrams \ which are 
of the type 

1 

2,-, {";,-, P 

. (6.55) 

a'--'' — •' 

They are characterized by the fact that all lines begin and end on the same vertex and form single 
loops. The contribution of such a diagram is given by 

dp(M) = X^-^n,n = H r H-D^-'Mq = Mp[l +0(M-^)] , (6.56) 

n=1 q=0 

where the last equation follows from Eq.(6.50). The maximal number of leaves in a product in 
the sum of daisy diagrams is equal to the number of loops Lb in the original diagram, so that 

the contribution of a diagram with Lb boson loops is Mlj, [1 + ( M.~^ ]] . rule 5 

The leading order contribution of a diagram with Lb boson loops is thus of the order of M.'*. 

6.2.2.1 Extra rule if hp = 1 

If hp = 1 Vp > 1 , then all kind of cancellations between diagrams occur, because in those cases 
Mp ~ MP Vp > 1 . As a result of this, the contribution of a daisy diagram is dp(M.) ~ M.^', and 
we can deduce the following rule: the contribution of a diagram that falls apart in disjunct pieces 
if a vertex is cut, is equal to the product of the contributions of those disjunct pieces times one 
plus vanishing corrections. Diagrammatically, the rule looks like 

0(b) ~ (a) X (0 , (6.57) 

In Section 6.1.1.4 we called discrepancies for which Eq.(6.57) is exact one- vertex decomposable, 
and have shown that for those discrepancies only the one-vertex irreducible diagrams contribute, 
i.e., diagrams that do not fall apart in pieces containing bosonic parts if a vertex is cut. The 
previous rule tells us that, if hp = 1 Vp > 1 , then 

log G [z] ~ sum of all connected one-vertex irreducible diagrams. rule 6 

The connected one-vertex irreducible diagrams we call relevant and the others irrelevant. 

^For example ■ ~ {/ \; — i---'-. = — 2i. = — C>Cj ~ ^ (.',";<"'; — '.','"".') 
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6.2.3 Loop analysis 

We want to determine the contribution of the diagrams in this section, and in order to do that, we 
need to introduce some notation: 



Lb 
U 
L 

lb 
If 

V 



= the number of boson loops ; (6.58) 

the number of fermion loops ; (6.59) 

= the total number of loops ; (6.60) 

= the number of bosoniclines ; (6.61) 

= the number of fermionic lines ; (6.62) 

= the number of vertices ; (6.63) 

= L — Lb — Lf = number of mixed loops . (6.64) 



These quantities are in principle functions of the diagrams, but we will never denote this depen- 
dence explicitly, for it will always be clear which diagram we are referring to when we use the 
quantities. 

With the foregoing, we deduce that the contribution Ca of a connected diagram A with no 
external legs satisfies 

Ca X M^^N^'g^^* . (6.65) 

The Feynman rules and basic graph theory tell us that, for connected diagrams with no external 
legs, V = If and L = lb + If — v + 1 , so that 

lb = L - 1 = Lb + Lf + L„ - 1 . (6.66) 

If we furthermore use that g = V2z/N , we find that the contribution is given by 

Notice that this expression does not depend on Lf. Furthermore, it is clear that, for large M and 
N , the largest contribution comes from diagrams with L^ = 0. Moreover, we see that we must 
have N = 0(M.), for else the contribution of higher-order diagrams will grow with the number 
of boson loops, and the perturbation series becomes completely senseless. If, however, N x M, 
then the contribution of each diagram with Lm = is more important than the contribution of 
each of the diagrams with Lm > 0. Finally, we also see that the contribution of the 0(M~^)- 
corrections of a diagram (Eq.(6.56)) is always negligible compared to the leading contribution of 
each diagram with L^ = 0. These observations lead to the conclusion that, if N and M become 
large with N x M, then the leading contribution to logG(z) comes from the diagrams with 
Lm = 0, and that there are no corrections to these contributions. If we assume that M/N is 
small, then the importance of these diagrams decreases with the number of boson loops Lb as 
(M/N)^. 
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6.2.3.1 The loop expansion of log G [z] 



Now we calculate the first few terms in the loop expansion of log G(z). We start with the di- 
agrams with one loop (remember that it is an expansion in boson loops and that we only have 
to calculate connected diagrams for log G(z]). The sum of all 1-loop diagrams with = is 
given by the l.h.s. of Eq. (6.4), resulting in the r.h.s. of Eq. (6.25). To calculate the higher loop 
diagrams, we introduce the effective vertex of Eq. (6.5) and the partly re-summed propagator 



n- 



m := n m + n- 



m + n ► * m + n 



m + 



oo 

= _^(Ng2)Pxn m = — 



Iz 



X n ra 



(6.68) 



The contribution of the 2-loop diagrams with Lm = is given by 

gOO + gOOO + gO-O + 



1 Ng^M^ 1 Ng^Ma 1 (Ng3)2(M2- 
'8 (1 -IzY ^ 8 (1 -2z)2 + 8 (1 -2z)3 



1 

N 



1 



M2-M2)ti(z)^ + 



5M2 



24 



8 



where we define 



Iz 
1 -2z 



12 (1 -2z]3 
[1+0(M-1)] , 



[1 +0(M-^)] 
(6.69) 

(6.70) 



Notice that the first three diagrams vanish if hp = 1 Vp > 1. The contribution of the 3-loop 
diagrams with Lm = is given by 



+ 



+ 



+ 



24cM0 ' 48^^ 
1 



+ 



1 



16 




+ 



8 




+ 



16 




(6.71) 



12000 + TgOOoOo • 

If Hp = 1 V p > 1 , then only the first four diagrams are relevant, and their contribution C satisfies 



(6.72) 
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6.2.4 Various limits 

In the previous calculations, M/N was the expansion parameter and the expansion of the gen- 
erating function only makes sense if it is considered to be small. Furthermore, a limit in which 
M — > oo does not exist, because the zeroth order term is proportional to M. In order to ana- 
lyze limits in which M. as well as N go to infinity, we can go over to the standardized variable 
(Dm — E)/vV of the discrepancy (Section 2.1.2), where 

E := E(Dn) = M-1 (6.73) 
V:=V(D.)=2(M-„ + Mi^M!^^i^. (6.74) 

The generating function of the probability distribution of the standardized variable is given by 

C(a=E(e.p(,P^))=e.p(-l|)c(-|) . (6.75) 

Instead of the parameter z, the parameter E, = zy/V is considered to be of ( 1 ) in this perspective, 
in the sense that it are these values of £, that give the important contribution to the inverse Laplace 
transform to get from the generating function to the probability density, and the contribution of 
a diagram changes from (6.67) to 

. (6.76) 



In the following we will investigate limits of M. — > oo with, at first instance, the criterion of 
Eq. (6.45) as only restriction. The fact that the variance V shows up explicitly in the contribution 
of the diagrams, forces us to specify the behavior of M.2 more precisely. We will take 

M2 - X , < y < 2 . (6.77) 

Notice that Hi = 1 if y < 2 and that h,2 does not exist if y > 2. Furthermore, we cannot read 
off the natural expansion parameter from the contribution of the diagrams anymore, and have to 
specify the behavior of N. We will only consider limits in which 

N X M"' , a > . (6.78) 



Although they are a small subset of possible limits, those that can be specified by a pair (a,y) 
show an interesting picture. We will derive the results in the next section, but present them now 
in the following phase diagram: 
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It shows the region S = {(a,y) G | oc G [0, oo), y G [0, 2]} of the real ((X,y)-plane. In this 
region, there is a critical line £, given by 



\ if < t < I , 



£:={(f,(t),t) GSltG [0,2]} with f^(t]:=<^2 " (6.79) 

2- t if I < t < 2 . 

It separates S into two regions T and U, neither of which contains I. Our results are the following. 
Firstly, 

in the region T, the limit of M —) oo is not defined. (6.80) 

In this region, the standardized variable is not appropriate, and we see that there are too many 
diagrams that grow indefinitely with M. Secondly, 

in the region U, the limit of M. —) oo gives a Gaussian distribution. (6.81) 

Because we used the standardized variable, this distribution has necessarily zero expectation and 
unit variance. Finally, 

on the line i, various limits exist, depending on the behaviour of Mp, p > 2. (6.82) 

One of these limits we were able to calculate explicitly. It appears if Mp — M^^ -< IW^i Vp > 1 , 
which is, for example, satisfied in the case of equal binning. In this limit, the generating function 
is given by 



logG(y = l(e^^-l-A^) , A:= lim ^ . (6.83) 

In Appendix 6B, we show that the probability distribution H belonging to this generating func- 
tion, which is the inverse Laplace transform, is given by 



H(t) = 



neN 



T 



nA — — 
A 



n! 



A2/ ^'^^V A^ 



- - exp -- . (6.84) 
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It consists of an infinite number of Dirac delta-distributions, weighed with a Poisson distribution. 
The delta-distributions reveal the fact that, for finite N and M, the Lego discrepancy, and also the 
X^-statistic, can only take a finite number of values, so that the probability density should consist 
of a sum of delta-distributions. In the usual limit of N — > oo, the discrete nature of the random 
variable disappears, and the x^-distribution is obtained. In our limit, however, the discrete nature 
does not yet disappear. A continuous distribution is obtained if A — > 0, which corresponds with 
going over from ex = ^ to a > ^. Then G(£,] — > exp(|£,^). 



6.2.5 Derivation of the various limits 

We will deal with the cases y = 2, y — a < 1 and y — a > 1 separately. 



6.2.5.1 y = 2 

We distinguish the three cases 0<a<l,a=l and a > 1 . 

If a > 1 , then V x M, and the contribution Ca of a diagram A satisfies Ca x M.!^, with 

P = (l-(x)Lb-|Lf+(a+l](1 -LJ . (6.85) 

A short analysis shows that only diagrams with (Lb, Lf, Lm) = (1 , 1 , 0] or (Lb, Lf, Lm) = (1 , 2, 0) 
give a non-vanishing contribution, and those diagrams are 

1 ^ 'N(M-1)2^ ^ E£ 

2 2 Nv^ VV 

The first diagram gives a contribution that is linear in £, and cancels with the exponent in Eq.(6.75). 
This has to happen for every value of a, and as we will see, this diagram will occur always. No- 
tice that the diagrams above are the first two diagrams in the series on the l.h.s of Eq. (6.4). 
The logarithm of the generating function becomes quadratic, so that the probability distribution 
becomes Gaussian. 

If (X = 1, then again V x M, so that (3 = — ^(Lb + Lf) + |(1 — Lm), and we have to add the 
diagrams with (Lb, Lf, L^) = (2, 1 , 0): 

1 ^ 1 1 N(M2-MW (M2-M^)£,^ 

IfO < a< l,thenVx M2-«and(3 = -fLb-(l-f)Lf-(f +l)Lm+f+1, sothat, besides 
the diagram of Eq. (6.86), only the diagrams of Eq. (6.88) give a non-vanishing contribution, and 
this contribution is equal to £,^/2. 
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6.2.5.2 y - a < 1 

In this case, V x M, and the contribution Ca of a diagram A satisfies Ca x M.'^ with 

|3 = (i-a)Lb-|Lf+(a+|)(l-U . (6.89) 

If a < J, then (3 increases with the number of boson loops Lb, and we are not able to calculate 
the limit of M ^ oo. 

If a > J, then the only diagrams that have a non- vanishing contribution are those with 
(Lb,Lf, Lm) = (1,1,0), (1,2,0) or (2,1,0). These are exactly the diagrams of Eq. (6.86), 
Eq. (6.87) and Eq. (6.88). Notice, however, that the diagrams of Eq. (6.88) cancel if y — ex < 0: 
then they are irrelevant. The resulting asymptotic distribution is Gaussian again. 

If oi = J, then Lb disappears from the equation for |3, and we obtain a non-Gaussian asymp- 
totic distribution. The diagrams that contribute are those with (Lf, L^] = (1,0] or (2,0). 
There is, however, only one relevant diagram with (Lf, Lm) = (1,0), namely the diagram of 
Eq.(6.86) that gives the linear term. We have to be careful here, because the other diagrams with 
(Lf, Lm) = (1,0) still might be non- vanishing. A short analysis shows that they are given by the 
sum of all ways to put daisy diagrams to one fermion loop, and that their contribution is given by 

C,(M,=NIog(,,I.ii5M) . (,,0, 

We know that, if hp = 1 , then dp(M) = M^[l + £p(M)] with lim^4_^oo £p(M) = 0, so that 

Ci(M) = lNMg^ + Nlog(|l+e-i^9^f_ii^^^^ . (6.91) 

The first term gives the leading contribution; the contribution of the relevant diagram, which 
consists of a boson loop and a fermion loop attached to one vertex. The second term is irrelevant 
with respect to the first, but can still be non-vanishing, depending on the behavior of £p(M.). 
Remember that a = ^ and V x M, so that Mg^ = 2£,]Vl/ (NaA' ) — > constant, and we can see 
that the contribution is only vanishing if 

lim N£p(M] = Vp>l <^ Mp-M^^M^-i Vp > 1 . (6.92) 

M-^oo 

For p = 1 this relation is satisfied because £i (M) = 0. For p = 2 this relation is also satisfied if 
y < |. 

If the relation is also satisfied for the other values of p, then the only diagrams that contribute 
to the generating function are the relevant diagrams with (Lf, Lm) = (2, 0): 



(6.93) 
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where we used the effective vertex (6.5) again. The contribution of a diagram of this type with p 
boson Unes is given by 

-L N^M,[1 +0(M-)] ^ ^1 . (6.94) 

2p! VnVV/ 2Mp!VNVvy 

The factor l/2p! is the symmetry factor of this type of diagram. If we sum the contribution of 
these diagrams and use that V ~ 2M., we obtain 

IogG(£,) ~ -L (e^^ _ 1 _ A£,) , A:= lim ^ . (6.95) 

A M-^oo IN 

6.2.5.3 y - (x> 1 

In this case, V x M"^ * and the contribution Ca of a diagram A satisfies Ca x M.^ with 

P = (l_D^)L,-l=5^Lf+3^(l-U . (6.96) 

If y + a < 2, then |3 increases with the number of boson loops Lb, and we are not able to 
calculate the limit of M. — > oo. 

If y + (X > 2, then the only diagrams that have a non-vanishing contribution are those 
with (Lb,Lf, Lm) = (1 , 1 , 0), (1 , 2, 0) or (2,1,0). These are exactly the diagrams of Eq. (6.86), 
Eq. (6.87) and Eq. (6.88). Notice, however, that the diagrams of Eq. (6.88) cancel if y — a < 0: 
then they are irrelevant. The resulting asymptotic distribution is Gaussian. 

If y + (X = 2, then |3 = (a — 1 )Lf + 1 — L^. Because y — a > 1 , we have oi< \, and non- 
vanishing diagrams have (Lf, Lm) = (1,0). Their contribution is given by the r.h.s. of Eq. (6.91), 
the first term of which gives the term linear in £,. The second term is non-vanishing, because 
Mg^ X M^-iy+o<-)/^ constant and N£2(M) x constant 



6.3 Stronger-than-weak limits for diaphony 

In [26], it is proven that the standardized variable of the Fourier diaphony (Section 3.3.3) con- 
verges in distribution to a Gaussian variable if the number N of points in the point set, and with 
it the number Sn of dimensions of the integration region, goes to infinity such that c^"^ /N goes 
to zero, where 



' ^ 45 



945 I 945 

= 1.79218- •• . (6.97) 



To be more precise, if Si , S2, . . . is a nondecreasing sequence such that 

c,^^ . . Dn-E(Dn] 



limsup^=0 then '\ ' Normal , (6.98) 

where we include in the notation "Dn" the dependence on N through sn. The proof makes use 
of the Central Limit Theorem as given in Section 2.1.5.1, and is, roughly speaking, based on the 
fact that the conditions in the theorem are satisfied if E(D^)/V(Dn)-^ — > 0. 
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6.3.1 The observation 



The Central Limit Theorem provides a weak Umit for the distribution, which becomes dramat- 
ically clear in a short calculation. The expectation value and the variance of the diaphony are 
given by E(Dn) = JjjS(x,x) dxand V(Dn) = Jjj3(x,'y)^dxdij, leading to 



E:=E(Dn) = 1 



Vn:=V(Dn)=2 



N - 1 



1 + — 

' ^ 45 



sn 



1 



N 



SN 



1 



(6.99) 



The moments of the diaphony contain contributions of all kind of convolutions of the reduced 
two-point function !B. One such convolution for the fifth moment E(D^] is given by 



S(x,v]5dxdv 



K 



yt -r 9 189 ^ 189 ^ 



18711 ) ^ \^ ' 15 """^ 



Sn 



945 



Sn 



io(i + S 



Sn 



+ 4 



1+T 



945 J 

Sn 



1 



-5 



If Sn becomes large, the leading contribution in the expression above is given by the first term. 
The convolution contributes to E(D^) with a third power of 1 /N, and this means that the fifth 
moment of the standardized variable behaves at least as 

Sn 



1 + 



189 



7t° 

189 



4n'" 
1871 1 



Sn 



> 



1 



N3V; 



5/2 
N 



N 



2(N-1) 



5/2 



if N becomes large, and calculation of the other contributions to E(D^) shows that this behavior 
is not canceled. So clearly, the fifth moment of the standardized variable may explode in the 
weak limit of (6.98). 



6.3.2 The statement 

In this section, we derive a limit in which all moments of the standardized variable converge to 
the moments of a normal variable, which is therefore 'stronger' than the weak limit, in the sense 
that if Sn grows with N such that the 'strong' limit appears, it grows such that the criterion of 
(6.98) is certainly satisfied (Section 2.1.7). Actually, we will see that the 'strong' limit appears 
under the same type of condition, but of course with a constant Cj > c^. Our exact statement 
shall be that if sn —> oo as N —> oo such that 

limsup^=0 (6.100) 
N N 

then all moments of the standardized variable converge to the moments of a normal variable. We 
shall show that it works for Cs > od^^^ and probably even for a < Cs < cx'^'^^, where 

+ ^)0 + 4)^''' • ^^-^^^^ 

Note that a = 2.41 146 • • • and a^/^ = 3.23376 • • • . 
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6.3.3 The scenario 

The logarithm W := log G of the generating function of the standardized variable is given by 



W(£,) = logE exp £, 



N 



+ w 



N 



N 



(6.102) 



where W := log G is the logarithm of the generating function of the probability distribution 
of Djsj. So W can be calculated using the Feynman rules of Section 3.2.4 if g := a/2z/N is 
replaced by g := a/2£,/(N VVn ). This is equivalent with using g := a/2£,/N and replacing 
the propagator "Bhy "B :— 'B/^/V^, and it is this what we shall do. Only connected diagrams 
contribute to W(£,), and there is only one diagram that gives a contribution linear in £,, namely 



1 

2C..-O = 



Ng^ 



" - Ef, 

K V 



(6.103) 



which cancels against the term — E£,/ V^Vn in Eq. (6. 102). The second-order in £, is given by 



^^--^ + - — 4 — 



c2 

3(x,-y)2dxdtj = ^ 

K 



(6.104) 



Note that the third and the fourth diagram cancel each other. These results for the first two 
orders in £, are in correspondence with the fact that we use the standardized variable. If we find 
a criterion dictating how Sn— >ooasN— >oo such that the contribution of all other diagrams 
vanishes, regardless of the value of £,, then this vanishing happens order by order in £, because 
each order consists of a sum of diagrams. This then implies that all moments of the standardized 
variable converge to the moments of a normal variable. 

6.3.4 The calculation 

In order to calculate the diagrams, we expand "B in terms of the complex Fourier basis 

S(x,v] = ^a^le^^Mx-y] ^ (5J05) 
ft 

where the sum is over all ft e except the constant mode ft = (0, 0, ... , 0), and we denote 
ft • X := Y.^=^ TT-v^^- The strengths CTft are given by 



where 



.2 ■> FT 1 



ru, if ^ , 



Tn := Pn 



Pn:=2|1-- 



(6.106) 



(6.107) 



108 



Finite-sample corrections to discrepancy distributions 



Notice that, because we absorbed the factor in the strengths depend on N. Because we 
expanded "B in terms of the complex exponentials, convolutions of this two-point function can 
be calculated as sums over products of the strengths O"^. As a result of this, we can go over to 
another boson propagator 

ft m = a|6ft_m , (6.108) 

and the rule that in a vertex with k boson legs, boson propagators have to be convoluted as 

H Siti -Ai -A2 • • • -m^ (Ttl + ""^2 + • • • + flk = 0) , (6. 109) 

fil ,n.2,... ,ftk 

where the logical 9-function expresses that the sum of the labels has to be zero. These Feynman 
rules give the same result as the original rules. 

Because the Fourier diaphony is translation invariant, it is one-vertex decomposable, so that 
we only have to consider one-vertex irreducible (IVI) diagrams (Section 6.1.1.4). The other 
diagrams cancel exactly. For each IVI diagram that has vertices that are not of the type of 
Eq.(6.5), there exists a diagram that has exactly the same bosonic part, but only effective vertices 
of the type of Eq. (6.5), and therefore carries a smaller power of 1 /N. Combining this with the 
fact that we only have to consider connected diagrams to calculate W(£,), we see that 

for the limit of large N , we only have to consider connected 1 VI diagrams ^ 
with all vertices of the type of Eq. (6.5), which we call relevant. 

The power of 1 /N that is carried by a relevant diagram is given by 1 /Ni'/^"^, where p is the 
sum of all bosonic legs of all vertices, and v the number of vertices. Basic graph theory tells us 
that the number of bosonic lines I is equal to p/2, and the number of bosonic loops L is equal to 
I — V + 1 , so that 

the power of 1 /N carried by a relevant diagram is ]/'N^~^, (6 111) 

where L is the number of bosonic loops. 

So the natural way to order the diagrams is by number of bosonic loops. From now on, the draw- 
ing of a diagram only represents the contribution coming from the bosonic part of the diagram, 
stripped from its factors of g = -^2£,/N, its factors of N coming from the fermionic piece, and 
the symmetry factors, which we call the bare contribution. 

6.3.4.1 One loop 



Diagrams with no loops do not exist (because of the Landau gauge), and the relevant diagrams 
with only one loop contribute with 
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where 



00-1 /-» 



n=1 



n=1 



(6.113) 



Using the line of argument with Eq. (5.14) and Eq. (5.15), we derive that yp+i < yp. Explicit 
calculation shows that = + 1 ) + 1 ]^^^ < 1 , so that yp < 1 for all p > 2, and 



the contribution of all one-loop diagrams with more than two vertices 
vanishes if N — > oo and s — > oo. 



(6.114) 



The diagrams with one or two vertices contribute to the first two powers in £, (Eq. (6.103) and 
Eq. (6. 104)). If N — > oo and s stays finite, only one-loop diagrams do not vanish, and the analysis 
above was just a repetition of what was done in Chapter 5. 

6.3.4.2 More than one loop 

In order to estimate the contribution of the higher-loop diagrams, we observe that, because < 
&i < 1 for all n and all values of N, 

Y_ e(ni+mi =0)6-^^6^, ,_T^2 0(fl^2 + m2 = 0] < ^ e(ni + m = 0] 6(1x2 - m = 0) 



TTll ,Tn.2 



= Q[n^ +n2 = 0) 



(6.115) 



As a result of this, the bare contribution of a relevant diagram can be estimated by repeated 
application of the operation 



(6.116) 



until only one vertex remains. This operation leaves the number of loops L invariant, so that the 
bare contribution of a relevant diagram is smaller than 



1 

2 ,.. ( '},...L 



(6.117) 



where 

d(N; 



Pn 



1/2 



1 + 



2 \s 



1 



1+T 



1 



1 + 



1 



-1/2 



(6.118) 



Using this result, (6.110) and (6.111), we conclude that if s, N — > 00, then the behavior of the 
contribution of any 1 VI diagram with L > 1 loops is dominated by 



where a := (^1 + ^) (^1 + 



(6.119) 
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so that all diagrams with more than one loop vanish ifs = SN— >ooasN — >oo, such that 

limsup-^ = where Cs = a . (6.120) 

We have already established that, in the expansion of W(f,) in there is no linear term, and the 
quadratic term is given by £,^/2, as demanded by the fact that we are dealing with the standardized 
variable. The one-loop diagrams contributing to the higher powers vanish if s, N — > oo, and all 
other diagrams vanish if (6. 120) holds. 



6.3.4.3 Leading contributions 

In the previous section we have put a bound on the contribution of each diagram, which resulted 
in (6.120). This result comes from the bound on the two-loop diagrams. For the lower-loop 
diagrams, however, the determination of the actual leading behavior is attainable. There is, for 
example, only one relevant two-loop diagram, which has the following bosonic structure: 



(6.121) 



Its bare contribution is 



S(x,-y)^dxdy = 



K 



1 + 



sn 



3/2 



(6.122) 



where 



:= (l + T5 + M)('+i) 



-3/2 



< cx 



3/2 



(6.123) 



so that it suffices to take Cs = a^^^ in (6.120). The relevant three-loop diagrams have bosonic 
parts 



(6.124) 



and using (6.1 15), we immediately see that the last two diagrams are bounded by the first, which 
has a bare contribution 



S(x,-y)4dxdij 



15 ■ 945 ■ 945 ;"-4 0+S + ffi)"+6(l+g)"-3 



1 -I- 27f^ -I- -L ^ 



K 



Pn 



1 +^ 

' ^ 45 



^,27^,87ti,^^^,7^^ 

y ^ 15 ^ 945 ^ 945^ ' ^ 45^ 



Sn 



-2' 



SN 



— Iy,SN 



(6.125) 
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Application of (6.115) shows also that the bare contribution of the second diagram is bounded 
by 

Jk 

(6.126) 

and we have < oc^^^ and m^^^ < oc^^^, so that Cs = a^^^ also suffices in (6.120). We suspect 
that this works to all loop-orders, and that we can actually take Cs = a. 

6.4 Conclusions 

We have presented finite-sample corrections to the probability distributions of quadratic discrep- 
ancies under sets of N random points. The corrections are terms in an 1 /N expansion of the 
generating function of the probability distribution, consisting of the contribution of a finite num- 
ber of Feynman diagrams. We presented the diagrams up to and including the order of 1 /N^ 
for the general case, and derived a rule of diagram cancellation in the case of special discrep- 
ancies, which we call one-vertex decomposable. We have applied the formalism to the Lego 
discrepancy, the L2-discrepancy in one dimension and the Fourier diaphony in one dimension, 
and calculated the first two terms in the expansion. For the Lego discrepancy, this resulted in 
Eq.(6.25) and Eq.(6.27), for the L2-discrepancy in Eq.(6.35) and Eq.(6.36), and for the Fourier 
diaphony in Eq.(6.40) and Eq.(6.41). The Fourier diaphony and the Lego discrepancy with equal 
binning are one-vertex decomposable. For the latter, we also calculated the 1 /N^-term, which is 
in correspondence with the result of an alternative calculation up to the order of 1 /N^, given in 
Appendix 6A. 

In the second part of the chapter, we focused on the variant of the Lego discrepancy that 
is equivalent with a x^-statistic of N data points distributed over M bins. We have presented 
a procedure to calculate the generating function perturbatively if M and N become large. The 
natural expansion parameter we have identified to be M./N, and we have calculated the first few 
terms in the series explicitly. 

In order to calculate limits for the Lego discrepancy in which N , M — > oo, we have intro- 
duced the objects of Eq. (6.44) and restricted the behavior of the size of the bins such that they 
satisfy Eq. (6.45). Furthermore, we have gone over to the standardized variable of the discrep- 
ancy. For this variable, we have derived a phase diagram, representing the limits specified by 
Eq.(6.77) and Eq.(6.78). We have formulated the results in (6.80), (6.81) and (6.82). On of these 
results is that there are non-trivial limits if N , M ^ oo such that M"/^J — > constant with a < 1 . 
This result is in stark contrast with the rule of thumb that, in order to trust the x^-distribution, 
each bin has to contain at least a few data points. 

Finally, we have derived a limit in which all the moments of the standardized variable of the 
Fourier diaphony converge to the moments of a normal variable, which is given in (6.100). 



-2 



:= Jm'N 



112 Finite-sample corrections to discrepancy distributions 



6.5 Appendices 

Appendix 6A 

If we define, for the Lego discrepancy with equal bins, E = M. — 1 , r] (z) = 2z/ ( 1 — 2z) and 

(l-2z)^/^G(z) = Y. ^CifHt) , (6.127) 

n,p>0 

then the only non-zero n = 4 are given by 



^2 ^^^^-^^1^48^ 96^" + 48 

(E)=E('^e2-^E + | 



(E) = E ( — — E^ - ^E^ + - ^^E + 



1 ^ 1 



1 _ 5 



■V48 48 8 

1 7 71 7 



C'^^fE) -E( — -E^ + — E^-— E + — 
^2 ^^^-^1288 ^ 72 288^48 

C(%n - E ("-Le^ - IZe^ + ^E - 
^-3 ^^^-^[240^ 30^ ^UO^ 240 J 

p/ 53 3 1153 . 7423 527 

^3 ^''^"''1^576^ + 576"'' 48" 

C^'hv^-T^ ^ F4+461 3 _ 6581 8663 467 
~ ^[576 + 1152^ 1152" + 576 48 

^ ^ " Vn52" + 144^ 1152^ + 192 " 4 

C'^'fEl = E f-^E^ + ^E^ + ^E^ - I^E^ + !^E - ^ 
3 ^ ' \^0368 3456 3456 10368 576 72 



240 80 80 240 J 
1 ^4 349 3 7193 2 15283 67021 \ 



1440 960 720 320 1440 7 

pt^'fFl - F ^ 49 .4 _ 29069 3 372169 2 _ 571727 ^ 21503\ 
•-4 Itj-ti ^g^b ^^^^ b + ^^^^ b b+ 1 
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iS),^,_^( 13^5,13979. 2290601 3,1446743^2 



piojrpi _ £ _|_ — r 3 I P 

4 ^ ' V 23040 23040 92160 7680 

9583] 87 294773 \ 
23040 ^ 1152 ) 
rf^'fFl - F , 35077 4 _ 781079 . 993515 , _ 564301 24607\ 

4 ^''J ^"^6912 ^ 13824 13824 ^ 3456 1152 ^ 96 ; 

p(io),p. ^ . 1 p6 , _l3^p5 , 162721 _ 596467 3 
4 ^'^> 13824 ^ 3072 ^ 34560 9216 

1653251 ^2 253799^ 145199 



6912 768 960 



^ ^ V41472 13824 13824 41472 

19783^2 137875^ 1565 

H E E H 

192 1152 32 



497664 31104 82944 124416 

3942431 3 249239 2 250141 2575 \ 
497664 ^ 13824 ~ 13824 



Appendix 6B 

We want to calculate the integral 



e^-^t^' dz , U{z) = ^ (e^^ - 1 - Az) - ZT 



(6.128) 



We will make use of the fact that 



27rin\ 



= f t(z) — 27tin 



1 + At 



for all n e Z, so that 



(2n+1 )7il 



neZ 



^+^T 



27tt 



(2u-1 )7ri 



nez 



e^-(^5 dz 



(6.129) 



(6.130) 



Notice that the integral is independent of n, so that the sum can be interpreted as a sum of Dirac 
delta-distributions : 



neZ 



nez 



1 +At 



A2 



nez 









uA — ^ 




A 



(6.131) 
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These delta-distributions restrict the values that t can take. If we use these restrictions and do 
the appropriate variable substitutions, the remaining integral in (6. 130) can be reduced to 



exp 



e 
A2 



ncp dcp 



6^2 



n+1 



dw 



(6.132) 



where n G Z and the contour is closed around w — 0. According to Cauchy's theorem, the final 
integral is only non-zero if n e N, and in that case its value is 27rt:j|j(^)"^. The combination of 
these results gives Eq.(6.84). 



Chapter 7 

Phase space integration 



In particle physics, there is the need to integrate transition probabilities of particle processes over 
phase space, the space of all possible configurations of the final-state momenta (Section 1.2.2). 
This is usually done with the Monte Carlo method, and the first sections of this chapter deal 
with its basics and some useful techniques. The formalism converges towards the application for 
phase space in Section 7.4. 
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7.1 Monte Carlo integration 

For the Monte Carlo (MC) method of numerical integration, the integral of a function F over 
an integration region M has to be reduced to the integral of a function f over an s -dimensional 
hypercube K := [0,1]^. In order to do so, a suitable mapping cp : K i— > M and the normalization 
function g^p have to be determined (Section 1.1.2). A conceptual help in the search for such 
mappings is considering them algorithms to generate random variables with a certain probability 
distribution. This probability distribution enters the integration problem as follows. Given a 
probability density G on M, the integral of F over M can be written as 



Hv)dy = 



M 



^,G[v)dy , (7.1) 



so that the integral can be interpreted as the expectation value of F/ G under the probability 
density G on M. The only restriction on G is that its support should contain the support of F. 
The Monte Carlo method can then be directly applied to M. Let us denote the average of a 
function w over the first N points of a sequence t| i , y 2) • • • in M distributed following G by 

1 ^ 

GnN := ^^w(yk) , yi,y2,--- distributed with density G(tj). (7.2) 

k=1 

If w is taken equal to F/G, then the expectation value E(G^4[w]) = and the variance 

V(Gn[w]) = Vg[F]/N, where 

Vg[F] := (FVG)m-(Om • (7.3) 

If (F^/ G)]yj exists, so that Vg[F] is a finite number, then Gn[w] converges in probability to (F),^, 
and Vg[F]/N can be interpreted as the expected squared error (Section 2.1.5.2). This number 
is positive by definition, and extremalization with respect to G leads to G = |F/ which 
minimizes Vg[F] to zero if F is positive. Importance sampling can be interpreted as the effort to 
make G look like |F| as much as possible. The squared error can be estimated with 



G'i\w] := " — . (7.4) 

(2.) 

which satisfies E ( G jv^ [w] ] = V ( G n [w] ) . The integration is done most efficiently if the numbers 
w(iji) fluctuate as little as possible, so that GJ,^ [w] is as small as possible. That is what impor- 
tance sampling should take care of. The expected squared error on the error can be estimated 
with the help of 

^(4). . ^ Gn[w1 -4Gn[w3]GnM +3Gn[w^]^ (4N - 6)Gg'[w]^ 

^^^^^ ■ N(N-2)(N-3) (N-2)(N-3) ' ^ ^ 

which satisfies E(g{^'[w]) = V(G[^^[w]). If (FVG^)]yi exists, then g{^'[F/G] is a good estimator 
for the the expected squared error on the estimate of the expected squared integration error. 



Gn[w^] - Gn[w; 
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The integration problem is now for large part reduced to that of generating the sequence of 
points in M with the density G. In practice, the mapping cp still has to be made explicit, because 
algorithms usually start with the generation of numbers between and 1 , but the analysis of 
the whole algorithm can be done on a 'higher level' by considering the piece that generates the 
V) -variables a given 'black box'. The connection with Section 1.1.2 can be established by taking 
gcp := G o (p. 

7.2 The unitary algorithm formahsm 

In general, it is hard to find an efficient algorithm to generate sequences with a given distribution. 
It is often even hard to determine the density under which a sequence, generated with a given 
algorithm, is distributed. For a certain class of algorithms the latter can be done analytically with 
the unitary algorithm formalism (UAF). This is the class of unitary algorithms, that is, algorithms 
which produce an output with probability one. This may sound a bit mysterious, and in order 
to explain what we mean, we introduce the class of stepwise unitary (SU) algorithms, of which 
all steps produce an output with probability one. We illustrate this with an example of a non-SU 
algorithm to generate a fair dice with five sides: 

1. throw a fair dice, output number of points; 

2. if the number of points is less than six: output <— number of points, else throw again. 

The first step produces an output with probability one: if you throw a dice, it generates a number. 
The second step, however, produces an output with probability |, so that the algorithm is not 
SU. The whole algorithm, is unitary: the probability to produce no output is limrt-^oo (|)^ = 0- 
Consequently, stepwise unitary is a relative concept. The steps may be "black boxes" that can be 
trusted to produce an output with probability one. Consider the following algorithm to generate 
numbers between 1 and 1 1 : 

1. throw a fair dice with five sides, output <— number of points; 

2. throw the dice again, output <— number of points plus previous number of points. 

This is a SU algorithm as long as we do not ask the question how the fair dice with five sides is 
generated. 

7.2.1 Notation 

We shall introduce the UAF with the help of two examples, but first we have to introduce some 
notation. We will frequently use the logical step function 9, which returns 1 if a statement IT is 
true, and otherwise: 




(7.6) 
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A relation that is satisfied by this function and that we will often use is 

e(n) = 1 -e(notn] . (v.v) 

Also the Dirac delta-distribution will often appear, and we recall its most important features 
(cf. [4]): if F is a sufficiently regular function on M, then 

nvnv-v')dy = Hv'] , (7.8) 

Jm 

and if (p : X I— > t) is an invertible and differentiable mapping, then 

b{<p-\y]-x] = |J^(x)|5((p(x]--y) , (7.9) 

where J,, is the determinant of the Jacobian matrix of (p. An integral over many variables will 
from now on start with a single J-symbol, and for every variable z a dz means 'integrate z over 
the appropriate integration region'. The order in which the variables appear is irrelevant. If it is 
not evident what the 'appropriate integration region' is, we shall make it explicit with the help of 
9-functions. 



7.2.2 The UAF for SU algorithms 

The following is an example of the use of the UAF for a SU algorithm. It is an algorithm to 
generate n numbers t|i, i = 1 , . . . , n uniformly distributed in [0,1] such that their sum is equal 
to 1 , and we are going to prove that it actually does. The algorithm goes as follows: 

1. generate n numbers Zi,i — 1 , . . . , n in [0, oo), distributed independently 
and with density ; 

2. put L <- 

3. put yi ^ Zi/L for i = 1 , . . . , n. 

The algorithm clearly produces numbers the sum of which is equal to 1 . The question is whether 
they are distributed uniformly, i.e., whether the density is, up to a normalization constant, equal 
to 6 (1 — Y.i=^ The UAF can answer the question as follows. Write every generation of a 
variable in the algorithm as the integral over the density with which it is generated, and interpret 
every assignment as a generation with a density that is given by a Dirac delta-distribution. Only 
the assignment of the final output should not be written as an integral, but only with the delta- 
distributions. The integral obtained gives the generated density P. So in this case we have 



ny) 



n n n 

(ndZie-^)dL6(L-^Zi)(nM-yi-Zi/I-)) • (7.10) 



i=1 i=1 i=1 
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The unitarity of the algorithm is represented by the fact that integration over the y-variables 
of this equation gives the identity 1 = 1 . To find the density, we have to eliminate the Zi-and 
L-integrals. Application of the rules for the delta-distributions gives 



TV Tl 

6(l -^Vi) dLL"-^e-^e(0<L<oo) = r(n)5(l-^Vi) 



(7.11) 

i=1 t=1 

and we see that the the iji- variables are generated with the correct density. We even calculated 
the normalization factor, which is r[n] = (tl — 1 )!. The step 'generate z with density e^^' is a 
black box in this example, but can be made explicit. Such variable can be obtained by generating 
X uniformly in [0, 1] and putting z < log(x), since 



dx6(z + log(x)) 9(0 < X < 1) = e 



(7.12) 



7.2.3 The UAF for non-SU algorithms 

As an application of the UAF to a non-SU algorithm, we show the correctness of the ratio-of- 
uniforms method for the generation of a random variable with a given density g. The algorithm 
goes as follows [5]. Let b > sup^ ^/g[y), a_ < inf^ vy^g(v) and a+ > supyy^g(v). Then 
one has to 

1. generate Xi uniformly in [0,b]; 

2. generate Xa uniformly in [a_, aj; 

3. if < g(xi/x2) then put y <— Xi/x2, else reject xi , xi and start anew. 

Just as our algorithm for the fair dice with five sides, it uses the rejection method, and the third 
step is not unitary. For this algorithm, we can write down a recursive equation for the probability 
density P that is generated. If we denote the volume of the space in which xi and xz are generated 
V, then this equation is then given by 

1 



dxi dx2 



V 



(xi < g(xi/x2)) 5(-y -X1/X2) + e (x| > g(xi/x2)) P(-y) 



(7.13) 



Integration of the equation over y gives the identity 1 = 1 again, expressing the unitarity of the 



algorithm. If we now use Eq. (7.7) and replace the variables Xi , X2 by t 
we get the equation 



X1/X2 and z := x 



2' 



ny) 



1 



dtdz y {e(z < g(t)) [b{y - 1) - P(v]] + P(v)} 



(7.14) 



The region in which xi and X2 are generated are such that infy g{y) < z < sup^ g{y). Further- 
more is J dtdz = J dxi dx2 = V, so that the equation becomes 







dt;J^g(t][6(ij-t]-P(y)] 



ny) 



9iy] 



(7.15) 



Jdtg(t) ' 

and we see that the algorithm is correct. We even see that the function g, used in the algorithm, 
does not have to be normalized: the algorithm itself is unitary. 
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7.3 Some useful techniques 

7.3.1 Inversion 

The most straightforward way of generating random variables y G M following a certain proba- 
bility distribution G is with an invertible mapping cp : K i— > M, by generating x e K and putting 
y <— (p(x). The generated density P is given by 

P[y) = [ dx6(y-(p(x]] = |J^-i(-y]| , (7.16) 
Jk 

so that |J(p-i (t|)| should be equal to G{y). The search for cp is an integration and inversion 
problem, and is usually very hard to solve in practice, even for one-dimensional variables. 

7.3.2 Crude MC 

Sometimes, part of the difficulty of the integration problem lies in the shape of the integration 
region M, which might be complicated. Usually, however, it can be seen as a subspace of a 
simpler manifold M with the same dimension. One can look for a probability density G on M 
then, and integrate the function F^m, where is the characteristic function of M. This just 
means that a density 



(7.17) 



M/ 



M 



is used on M. The algorithm to generate a sequence of variables y following this density is very 
simple: 

1. generate x in M following G; 

2. if X e M then put t| <— x, else reject x and start anew. 

This is called crude or hit-and-miss MC. The proof of the correctness is also simple. In the UAF, 
the generated density P satisfies 



dxG(x)[e(x e M)5(y -x) + e(x ^M]P(-y)] . (7.18) 



If we use Eq. (7.7) and evaluate the integrals, (7.17) is found as the solution to the equation. 

In principle, this method always works, but can be inefficient if the volume of M is much 
larger than the volume of M. If the integrand is as simple as possible, i.e., T[y) = 1 so that the 
original problem was that of determining the volume Vol( M ) of M, and if one would take for G 
the uniform distribution on M, then 

Vg [F] = Vol( M ) [Vol( M ) - Vol( M )] . (7.19) 

So if the difference between the volumes is large, one better chooses a density G that is substan- 
tially larger on M than on M — M. 
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7.3.3 Rejection 

Crude MC is a special case of the rejection method to generate variables y following a density 
F on M. One needs an algorithm to generate variables on M following some density G and a 
number c such that cG(t|) > F(v) for all t| e M. To obtain the density F, one should 

1. generate x following G, and p uniformly in [0, 1]; 

2. if pcG(x) < F(x] then put y <— x, else reject x, p and start anew. 
The generated density P satisfies 



P(y) 



dxG(x) dp[e(pcG(x) <F(x))6(v-x) + e(pcG(x] > F(x) )P(ij)] , (7.20) 



and if we use Eq. (7.7) again and the fact that dp 9(ap < b) = b/a, the solution P{y] — 
F(t|)/(F)M is found. In principle, this method works for any bounded F, since there are al- 
ways an easy to generate density G and a c that will do the job: G = 1 /Vol( M ) and c > 
Vol(M) 

s^PyeM ^iv)- However, the algorithm can become very inefficient. The efficiency can 
be expressed by (F)]yi/(c(G)M), and if this number is small, the variable x will often be rejected 
in step 2. 



7.3.4 Sum of densities 

As an example in which integer random variables have to be generated, we present a method 
to generate a density that is the normalized sum of a number of positive functions gt with i = 
1 , . . . , n. To generate the density 

Giv) = ^gdv) , 9i{v):= ^n^f, , (V.21) 

one has to 

1. generate an integer i with probability (gi)M and put k <— i; 

2. generate x with density and put v) <— x. 

To cast it into the UAF, a summation over the integer random variable has to be included into the 
equation for the density generated: 



X(9i)M5i,icdxgk(x)5(ij-x) . (7.22) 



i=i 



The assignment 'k <— i' is represented by the Kronecker delta-symbol 6|_k- Evaluation of the in- 
tegrals leads trivially to the correct density. To generate i, the unit interval [0,1] can be dissected 
into n bins of size (gi)M> and t becomes the number of the bin a random number z, distributed 
uniformly in [0,1], falls in. 
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7.3.5 Adaptive MC 

If one considers the actual calculation of a MC-integral a real random process, it is a small 
step to the question whether the density G may change during the process. The answer is yes 
(Section 2.1.5.2). If iji is generated following a density Gi, and then y2 following G2 which 
depends on tji, and then ys following G3 which depends on {y}2 := {ybVi} and so on, then 
l^(Ui<)/Gk({y}k) converges in probability to (F)m> with an estimated squared error 
given by Vm [F]/N , where 



Vn[F] 



1 

N 



N 



2 

M ) 



(7.23) 



with 



Xk) := 



Gi(vi]---Gic-i({v}k-i; 



Mk-l 



Gk({t/}k) 



dyi • • • dyic 1 



(7.24) 



The explicit dependence on N of this expression shows that, by adapting the density for each 
integration point in the right way, the error may be reduced and the integration process optimized. 

An example of a method to adapt the variance is weight optimization in multichannel-MC 
[30], in which a density G^iv) '■= o^iQidl) is generated, where each function Qi is a 

probability density itself, and the parameters oCi are positive and satisfy Y.i=^ = 1 • Let us 
define 



M 



Gcciv)- 



gi(v)dy 



(7.25) 



so that Vg„ [F] = Z!iLi 0iiWi{(x) — (F)^. It is not difficult to see that the variance, as function of 
the parameters a^, has a (local) minimum if the values of these parameters are such that Wi(a] 
has the same value for all i = 1 , . . . , n. Of course, the problem of finding these values is possibly 
even more difficult than the original integration problem, but with adaptive MC the values might 
be found approximately using an iterative procedure. The variance will then improve with each 
step. 

In [30], the following is suggested: one starts with some (sensible) values for the parameters 
OCi and, after generating a number of N points y^ following the density G^, one estimates Wi(a) 
with 



N 



E- - - r ^ 



gi(Vk)F(yk)^ 



k=1 



Ga(yk)' 



(7.26) 



for all I = 1 , . . . , TL. These numbers are then used to improve the values of the parameters, for 
example through the prescription 



(7.27) 



7.4 Random momenta beautifully organized 



123 



where c is some constant. The plausibility of this prescription is supported by the example of 
stratified sampling, in which the functions gt are normalized characteristic functions (■&i)M of 
non-overlapping subspaces of the integration region. In that case, Wt((x) = a^^(-&i)jy[(-&iF^)]yj, 
and we see that putting <— C(Xi^Wi[(x] will give the local minimum immediately, starting 
from any configuration for the parameters cXi. 

7.4 Random momenta beautifully organized 

As mentioned in Section 1.2.2, particle physicists often need to integrate differential cross sec- 
tions over phase space (PS), which is the space of all physically possible final-state momentum 
configurations. Usually, it depends on the transition amplitude which configurations are allowed, 
and here we mean by PS the space of all final- state momentum configurations for which the 
separate momenta sum up to a given momentum, and for which the particles have given masses. 
Because these restrictions reduce the dimension of the integration region, it has measure zero in 
the space of all momentum configurations so that the crude MC method is no option. 

One way to generate PS is by sequential two-body decays, i.e., by the recursive splitting of 
each momentum generated so far into two momenta (cf. [31]). The drawback of this method is 
that the efficiency is poor if the number of momenta and the total energy become large (cf. [32]). 
The high-energy limit is equivalent with the limit in which the masses of the particles become 
negligible, and for this situation, RAMBO [32] can be used. It generates any number of massless 
momenta with a given total energy distributed uniformly in PS. We will not deal with the algo- 
rithm adapted for the generation of massive momenta [33]. Another approach to PS generation, 
which we will also not address, uses the help of the metropolis algorithm [34]. 

7.4.1 Notation 

The relativistic momentum of an elementary particle is a vector in R"^. Its first component, also 
called the 0-component, gives the energy of the particle \ and the other three components give 
the real momentum in three-dimensional space: 

p = (p°,p\p^p3]:=(p°,p) ■ (7.28) 

The momentum with the opposite 3-momentum is denoted by 

p:=(p°,-p) . (7.29) 

The interpretation of R"* as a real vector space can be carried forward in the sense that a system of 
non-interacting particles has a momentum that is equal to the sum of the momenta of the separate 
particles. We shall need the first and the fourth canonical basis vectors, which we denote 

eo := (1,0,0,0) and 63 := (0,0,0, 1) . (7.30) 
^ We use units with which the speed of light is equal to 1 . 
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R"^ becomes Minkowski space if it is endowed with the Lorentz invariant quadratic form 

p2:=(p°]^-|p|' , lp|:=[(p^)'+(p')' + (p')']'/' . (7.31) 

The same notation for the quadratic form and the 2-component will not lead to confusion, be- 
cause the 2-component will not appear explicitly anymore after this section. The combination 
(p + q)^ — p^ — q^ defines a bi-linear product of two momenta, which is two times the scalar 
product 

(pq) :=p-q :=p°q°-p-q , p-q := p^q VpV + pV • (V.32) 

The notation with the parentheses shall be used in the next chapter. For physical particles, p^ has 
to be positive, and in that case, the square root gives the invariant mass of the particle: 

lUp := Vp2 if p^ > . (7.33) 

The group of linear transformations on R"^ that leave the quadratic form invariant, and the mem- 
bers of which have determinant 1 and leave the sign of the 0-component of a momentum invari- 
ant, is called the Lorentz group. It is generated by boosts, which are represented by symmetric 
matrices, and rotations, which are represented by orthogonal matrices. A boost that transforms a 
momentum p, with p^ > 0, to m^eo is denoted CKp, so 

!KpP = m^eo and mpIKpCo = p . (7.34) 

More explicitly, such a boost is given by 

:Kpq = (a,q -bp] where a = — , b = 4— ^ • (7.35) 

tPLp p° + mp 

A rotation that transforms p to p°eo + p 63 is denoted 0?p, so 

^vV = P°eo + |p|e3 and DlpP = p°eo - |p|e3 . (7.36) 

Since rotations only change the 3-momentum, we shall use the same symbol if a rotation is 
restricted to three-dimensional space. 

The physical PS of n particles is the (3n — 4) -dimensional subspace of R"^^, given by the 
restrictions that the energies of the particles are positive, the invariant masses squared p? are 
fixed to given positive values Si, and that the sum 

n 

V(ix)-=^Vi (7.37) 

of the momenta is fixed to a given momentum P. The restrictions for the separate momenta can 
be expressed with a 'PS characteristic distribution' 

^sJp):=6(p^-Si)e(p°>0) , and ^(p) := -&o(p) • (7.38) 
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The generic PS integral, of a function F of a set {p}n := {pi , . . . , Pn} of momenta, that has to be 
calculated is then given by 

r ^ 

(n d'Pi^sJp]) 6"(P(n) - P) HivU) . (7.39) 

We explicitly write down the number of degrees of freedom in the differentials and the delta- 
distributions in order to keep track of the dimensions. Each momentum component carries the 
dimension of a mass. 



7.4.2 The algorithm 

RAMBO was developed with the aim to generate the flat PS distribution of n massless momenta 
as uniformly as possible, and such that the sum of the momenta is equal to ^/s eo with s a given 
squared energy. This means that the system of momenta is in its center-of-mass frame (CMF), 
and that the density is proportional to the 'PS characteristic distribution' 

n 

Os({p}n) := 6"(P(n)-\/ieo)n^(Pi) • (7-40) 

i=1 

The algorithm consists of the following steps: 

Algorithm 7.4.1 (RAMBO) 

1. generate n massless vectors with positive energy without constraints but under some 
normalized density f ( q , ) ; 

2. compute the sum q^^ri) of the momenta q,; 

3. determine the Lorentz boost and scaling transform that bring q (^i) to ^/s eo; 

4. perform these transformations on the qj, and call the result Pj. 

Trivially, the algorithm generates momenta that satisfy the various 6-constraints, but it is not 
clear a priori that the momenta have the correct distribution. To prove that they actually do, we 
apply the UAF. It tells us that the generated density is given by 

P n 

Os({p}n) 



f|d%-&(qj)f(qj)) d^bsVb--^) dx5(x- 

j_1 M(n) H(n) 



nS^lPi-x^bqO . (V.41) 



n 

X 
i=1 

To calculate the distribution yielded by this algorithm, the integral has to be evaluated. First of 
all, some simple algebra using p(Ti) = x^\,q[n), c|(n.) = x~^3-C^V(n) and the Lorentz and scaling 
properties of the Dirac 6-distributions leads to 

Vl_\ _ 2s^ 



64(b-^)5(x-^) = ^8\vir^,-^seo]^[h'-^] . (7.42) 

\ TTia, , / V VXn, , / X 
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Furthermore, since we may write 

d4qj6(qf)64(pj_x:Kbqj) = \b[vf) (7.43) 
under the integral, the l.h.s. of Eq. (7.41) becomes 

es({p}n]d4b5[b^-l]dx^n^(-^bV0e(eo-:KbVj >0) . (7.44) 

^ i=1 ^ 

In the standard RAMBO algorithm, the following choice is made for f : 

f(q] = ^ exp(-cq°] , (7.45) 

where c is a positive number with the dimension of an inverse mass. Therefore, if we use that 
P(n) = Co and that q° = Bq- q for any q, then 

^^Y exp (-^ h') e(b° > 0) . (7.46) 



In J ^ 

As a result of this, the variables Pi, i = 1 , . . . , n only appear in &s, as required. The remaining 
integral is calculated in the Appendix at the end of this section, with the result that RAMBO 
generates the density 

*.((P>J = e.({p}j (^) ■ 0A7) 

Incidentally, we have computed here the volume of the PS for n massless particles: 

/7T\ qTl— 2 

Note, moreover, that c does not appear in the final answer; this is only natural since any change 
in c will automatically be compensated by a change in the computed value for x. Finally, it is 
important to realize that the 'original' PS has dimension 3n, while the resulting one has dimen- 
sion 3n — 4: there are configurations of the momenta q^ that are different, but after boosting and 
scaling end up as the same configuration of the Pj. It is this reduction of the dimensionality that 
necessitates the integrals over b and x. 

The first step of the algorithm consists of generating massless momenta with positive energy. 
To generate such momenta, we use that 

d4p^(p] = d(pdzdp°p°e(p° >o)e(o < (p <27t)e(-i <z< 1] , (7.49) 

with p = ( p°, p°fL(z, cp) ), where 

fii (z, cp] := Vl — sin cp , n.2(z, cp) := vT^-z^cos (p , n3(z, cp] := z . (7.50) 

From this we can directly see that, to generate p following a density proportional to d(p)f (p°], 
one should 
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Algorithm 7.4.2 (MASS LESS MOMENTUM) 

1. generate p° in [0, oo] following a density ~ p°f (p'^); 

2. generate cp uniformly distributed in [0, 2n] and z uniformly in [— 1 , 1]; 

3. construct n(z, cp) and put p <— p°n(z, (p). 

To generate p° following the density ip^ exp(— p°), one can 

Algorithm 7.4.3 (0-COMPONENT) 

1. generate Xi and X2 distributed uniformly in [0, 1] ; 

2. putp°^ log(XiX2], 

since 



dxidx20(O < xi,2 < 1) 5(p° + log(xiX2] 



7.4.3 Appendix 

We have to calculate the integral 

,2 



■1 



dx 



= p-e 



(7.51) 



dxd4b6(b^-l)e(b°>0):^:^exp('-^b°^ = 2r(2n)B(n) 



X" 



ricTi— 2 



where 



B(nl 



db°(b°)-2^V(b°)2-l 



d^b 6(b^ - 1 ) e(b° > 0] (b°)-^^ = 27t 
The 'Euler substitution' b° := j (v^^^ + v"^/'^) casts the integral in the form 

B(n) 



dv 



(v + 1) 



2n 



By the transformation v — > 1 /v it can easily be checked that the integral from 1 to oo is precisely 
equal to that from to 1 , so that we may write 



B(n) 



22n— 



roo vn_2vn-i +v^-2 T r(n-1)r(n) 

dv 7Z r^^ = 4'^~'7r- 





(1+v) 



2n 



r(2n) 



where we have used, by writing z := 1/(1 + v), that 

) r1 



dzz^-^-^(1 -zV 



r(q-p-i)r(p + i) 
r(q) 



Chapter 8 

Generating QCD-antennas 



An algorithm to generate random momenta, distributed with a density that contains the singular 
structure typically found in QCD-processes, is introduced. For the notation used we refer to 



Section 7.4. 
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8.1 Introduction 



In future experiments with hadron colliders, such as the LHC, many multi-jet final states will 
occur, which have very high particle multiplicities. The initial states will consist of two hadrons 
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in the center-of-mass frame (CMF). The processes involved in one transition (one event) are 
very complicated, and are usually considered to consist of three steps. The generic situation is 
depicted schematically in Fig. 8.1. Time can be considered to flow from the left to the right in 
the picture. The hadrons start the interaction with the emission of partons. The transition of a 
hadron into the emitted parton and the leftover is represented by the white blobs. This is the first 
step. In the second step, represented by the grey blob, the partons interact, resulting in n new 
partons. In step three, these partons turn into jets with high particle multiplicities. The idea is 
that the contribution of the three steps more-or-less factorize in the transition amplitude of the 
whole event, and that the processes can be dealt with separately. In this chapter, we deal with 
step two, the grey blob. 

The multi-jet events that will occur in hadron colliders can be divided into interesting events 
(IE) and very interesting events (VIE). The main difference between the two classes is that the 
existing model of elementary particles, the standard model, shall not have proven yet its capa- 
bility of dealing with the description of the VIE at the moment when they are analyzed. The IE 
shall not manifest themselves as such a heavy test for the standard model. However, we still need 
to know the cross sections of the IE in order to compare the ratio of these and those of the VIE 
with the predictions of the standard model. 

8.1.1 The problem 

Large part of the IE can be described by quantum chromo dynamics (QCD), the formalism of 
quarks and gluons, with which multi-parton QCD-amplitudes are calculated. It is well known 
[35] that they contribute to the cross section with a singular behavior in phase space (PS), given 
by the so-called antenna pole structure (APS). In particular, for processes involving only n 
gluons the most important contribution comes from the sum of all permutations in the momenta 
of 



1 

(PrP2)(P2-P3)(P3-P4) ■ ■ ■ {Vyy-^■V^^)[V^^■V^) 



(8.1) 
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single antenna integrated to 1 % error 


llUlliUCl Ui 

momenta 


cut-off 


niimher of 
PS points 


CM-energy 


3 


0.183 


10,069 


4 


0.129 


26,401 


5 


0.100 


58,799 


6 


0.0816 


130,591 


7 


0.0690 


240,436 


8 


0.0598 


610,570 



evaluation amplitude in 1 PS point 


niimher of 

final gluons 


cpu-time (seconds) 


SPHEL 


exact 


3 


2.83x10-^ 


1.60x10-^ 


4 


9.76x10-5 


5.54x10-1 


5 


4.88x10-4 


1.945 


6 


3.26x10-3 


6.06 


7 


2.57x10-2 


19.91 


8 




64.45 



Table 8.1: Typical number of PS points and computing times. 

and the singular nature stems from the fact that the scalar products Pi • Pj can become very 
small. If functions, containing this kind of kinematical structures, are integrated using the RAMBO 
(Section 7.4), which generates the momenta distributed uniformly in PS, then a large number of 
events is needed to reach a result to acceptable precision. As an illustration, we present in the 
left table of Tab.8.1 the number of PS points needed to integrate the single antenna of Eq.(8.1), 
so not even the sum of its permutations, to an expected error of 1 %. The antenna cannot be 
integrated over the whole of PS because of the singularities, so these have to be cut out. This 
is done through the restriction (pi + Pj)^ > Sq for all i, j = 1 , . . . , n,^ and in the table the 
ratio between -y/so and the total energy ^/s is given. These numbers are based on the reasonable 
choice Sq/s = 0.2/[n(n — 1 )]. 

Performing MC integration with very many events is not a problem if the evaluation of the in- 
tegrand in each PS point is cheap in computing time. This is, for example, the case for algorithms 
to calculate the squared multi-parton amplitudes based on the so called SPHEL-approximation, 
for which only the kinematical structure of (8.1) is implemented [35]. Nowadays, algorithms to 
calculate the exact matrix elements exist, which are far more time-consuming [37, 38]. As an 
illustration of what is meant by 'more time-consuming', we present the right table of Tab. 8.1 
with the typical cpu-time needed for the evaluation in one PS point of the integrand for processes 
of two gluons going to more gluons, both for the SPHEL-approximation and the exact matrix el- 
ements [39]. It is expected, and observed, that the exact matrix elements reveal the same kind of 
singularity structures as the APS, so that, according to the tables, the PS integration for a process 
with 8 final gluons would take in the order of 400 days . . . 



8.1.2 The solution 

The solution to this problem is importance sampling. Instead of RAMBO, a PS generator should be 
used which generates momenta with a density including the APS. The following sections show 
the construction of such a PS generator, called SARGE, which stands for Staggered Antenna 
Radiation GEnerator. 

'Remember that (p + q)'^ = 2p- q since p and q are massless. 
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8.2 The basic antenna 

As mentioned before, we want to generate momenta that represent radiated partons with a den- 
sity that has the antenna structure [(piPi) iVzVs) (P3P4) ■ ■ ■ > iVn-^Vn) (PnPi ]]^- Naturally, the 
momenta can be viewed as coming from a splitting process: one starts with two momenta, a third 
is radiated off creating a new pair of momenta of which a fourth is radiated off and so on. In 
fact, models similar to this are used in full-fledged Monte-Carlo generators like HERWIG. Let us 
therefore first try to generate a single massless momentum k, radiated from a pair of given mass- 
less momenta pi and p2. In order for the distribution to have the correct infrared and coUinear 
behavior, it should qualitatively be proportional to [(pik)(kp2)]^^ Furthermore, we want the 
density to be invariant under Lorentz transformations and scaling of the momenta, keeping in 
mind that the momenta are three out of possibly more in a CMF and that we have to perform 
these transformations in the end, like in RAMBO. This motivates us to define the basic antenna 
structure as 

,4^.Qr^,^ (PlP2) „ f iV^'^) \ „ f i'^V: 



dA(pi,p2;k) := d^k^(k) - , . g g . (8.2) 

7t (pik)(kp2) V(PlP2)/ VlPlPl)/ 

Here, g is a function that serves to regularize the infrared and coUinear singularities, as well as 
to ensure normalization over the whole space for k: therefore, g(£,) has to vanish sufficiently fast 
for both —> and £, — > 00. To find out how k could be generated, we evaluate J dA in the 
CMF of Pi and p2. Writing 

H := a/(p7p2)72 , V■=^v^+V2V^ , (\^=^v^+V2^y (8-3) 



we have 



(P1P2) = , (pik) = Eq°(l - z) , (kp2) = Eq°(l + z) , (8.4) 



where z — p q/(|p||q|). The azimuthal angle of q is denoted cp, so that q = |q|IRp^fl(z, cp), with 
n as in (7.50). We can write 

d\d{k) = ^q°dq°d(pdz = ^(pip2) d(p d£,id£,2 , (8.5) 

where, 

£,1 = r and £,2 = 7 r , (8.6) 

(PiPi) (P1P2) 

so that z = (£,2 - £,i)/(£,2 + £,i) and q° = E(f,2 + £,1). The integral over dA takes on the 
particularly simple form 



dA(pi,p2;k) = 



■00 1 \ 2 

^ d£,-g(£,)j . (8.7) 
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The antenna dA(pi,p2; will therefore correspond to a unitary algorithm when we let the 
density g be normalized by 

■oo 1 

d£,-g(£,) = 1 . (8.8) 
t. 

Note that the normalization of dA fixes the overall factor uniquely: in particular the appearance 
of the numerator (P1P2] is forced upon us by the unitarity requirement. 

For g we want to take, at this point, the simplest possible function we can think of, that has a 
sufficiently regularizing behavior. We introduce a positive non-zero number £,„! and take 

g(£,) := :^^-^e(C <£,<U . (8.9) 
21og£,m 

The number gives a cut-off for the quotients £,i and £,2 of the scalar products of the momenta, 
and not for the scalar products themselves. It is, however, possible to relate £,m to the total energy 
in the CMF and a cut-off So on the invariant masses, i.e., the requirement that 

(Pi + Pj ) > So for all momenta Pi Pj. (8. 10) 

This can be done by choosing 

£„:=±_(il±2K^. (8,1) 

So 2 

With this choice, the invariant masses (pi + k) and (k + P2] are regularized, but can still be 
smaller than so so that the whole of PS, cut by (8.10), is covered. The Sq can be derived from 
physical cuts pj on the transverse momenta and 9o on the angles between the outgoing momenta: 

-V 



So = 2pf -min -cosGo, (^1 +yi -pf/sj j . (8.12) 

With this choice, PS with the physical cuts is covered by PS with the cut of (8.10). To generate 
the physical PS, the method of crude Monte Carlo (Section 7.3.2) can be used, i.e., if momenta 
of an event do not satisfy the cuts, the whole event is rejected. We end this section with the piece 
of the PS algorithm that corresponds to the basic dA(p 1 , P2; k] : 

Algorithm 8.2.1 (BASIC ANTENNA) 

1. given {pi , P2}, put p <- ^Kp^ +p2Pi and put E <- ^J[^\^^]|1 \ 

2. generate two numbers £,1, £,2 independently, each from the density g (£,)/£,, and cp uni- 
formly in [0,27r) ; 

3. put z ^ (£,2 - £,i]/(£,2 + £,i), q° ^ E(£,2 + £,i) and q ^ q°3lpin(z, (p) ; 

4. putk<- J{-Vp^q . 
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8.3 A complete QCD antenna 

The straightforward way to generate n momenta with the antenna structured density is by re- 
peated use of the basic antenna. Let us denote 



then 

dAf,,dA|,dA4^---dA: 
where 



n-1 
u— 2,n 



dAii,:= dA(q^,q,c;qv) , 

(qiqn) gn({q}r 



(8.13) 



TL-1 



7t^-2(qiq2)(q2q3)(q3q4] • • • (qn-iqn) 



i=2 



gndqlnJ := g r g 



g 



[qaqsj 



!qn-iqu) 



.(qiqn)/ V(qiqn)y V(q2qn)/ Vlqiqn)/ V(qn-2qn), 

(8.14) 

So if we have two momenta qi and qn, then we can easily generate n — 2 momenta qj with 
the antenna structure. Remember that this differential PS volume is completely invariant under 
Lorentz transformations and scaling transformations, so that it seems self-evident to force the set 
of generated momenta in the CMF with a given energy, using the same kind of transformation 
as in the case of RAMBO. If the first two momenta are generated with density f (qi , c\n], then the 
UAF tells us that generated density A^'^°({p}Tt] satisfies 



Ar({p}n) = 



d^qi^lqOd^q^^lq J f(qi, q^) dAf ,^dA|^dAi^ • • • dA^^Z^^ 

n 

X d4b64(b-q(n)/mq,^,]dx6(x- Vs/mq,^,) ]^ S^lpi - xJCbqO . (8.15) 



If we apply the same manipulations as in the proof of the correctness of RAMBO, we obtain the 
equation 

(PlPn) gn({p}n) 



Ar({p}n) = 0,({pU) 



X 



7r"--2(pip2)(p2P3)(P3P4) • • • [V^-^Vry) 
d4b6(b^- 1) dx^f(x-^J{b^Pl , X-'n^^Vr.) 



(8.16) 



Now we choose f such that q i and qn are generated back-to-back in their CMF with total energy 
v^,i.e.. 



f(qi,qn] 

If we evaluate the second line of Eq.(8.16) with this f, we arrive at 



7T 



(8.17) 



4s^ 



7t 



1 



dx ^ d^b 6(b^ - 1 ) b^x-'-K^' (pi + pn] - ^eo) 



X 



X^ \ SX^ 



7t 



- 1 



27t(piPn)^ 



(8.18) 
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so that the generated density is given by 

Ar({p}n) = 03({pk)y4^7 u u ^"^!^^"! u V • (8.19) 

Note that, somewhat surprisingly, also the factor (pnPi]^^ comes out, thereby making the an- 
tenna even more symmetric. In fact, if the density f (qi , qi) = c'^exp(— cq° — cq2)/47T^ is taken 
instead of the one we just used, the calculation can again be done exactly, with exactly the same 
result. The algorithm to generate n momenta with the above antenna structure is given by 

Algorithm 8.3.1 (QCD ANTENNA) 

1. generate massless momenta qi and qrtl 

2. generate n — 2 momenta q , by the basic antennas dA^ ^dA^ ^dA3^ • • • dA{^l2^; 

3. compute q [^] — 1j' ^^^^ the boost and scaling transforms that bring q (^i) to \fs eo; 

4. for j = 1 , . . . , n, boost and scale the q^ accordingly, into the Pj. 

Usually, the event generator is used to generate cut PS. If a generated event does not satisfy 
the physical cuts, it is rejected. In the calculation of the weight coming with an event, the only 
contribution coming from the functions g is, therefore, their normalization. In total, this gives a 
factor 1/(2 log £,m)^"^^ in the density. 



8.4 Incoming momenta and symmetrization 

The density given by the algorithm above, is not quite what we want. First of all, we want to 
include the incoming momenta po and po in the APS, so that the density becomes proportional to 
[(poPi)(piP2) ■ ■ ■ (Pn-iPn)(PnPo]]"^ instead of [(P1P2) ■ ■ ■ (Pn-iPrL)(PnPi)]"^ Then we want 
the sum of all permutations of the momenta, including the incoming ones. 

8.4.1 Generating incoming momenta 

The incoming momenta can be generated after the antenna has been generated. To show how, let 
us introduce the following "regularized" scalar product: 

(pq)6 := (pq)+6p°q° , (8.20) 

where 5 is a small positive number. This regularization is not completely Lorentz invariant, but 
that does not matter here. Important is that it is still invariant under rotations, as we shall see. 
Using this regularization, we are able to generate a momentum k with a probability density 



1 ^(k)5(k°-1) 

27tl6(pi,P2) (Plk)6(kp2)6 



(8.21) 
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To show how, we calculate the normalization I6(pi,p2). Using the Feynman-representation of 
1 /[(pilc)6(kp2)6], it is easy to see that 



1 



47rp°p° 



dzdcp 



dx 



(l+6-|pJz)2 • 



(8.22) 



where Px = xpi + (x — 1 )p2- The integral over z and (p can now be performed, with the result 
that 



1 

hiv^.V2) = -0-0 



dx 



1 



(1+6)2-|pxP 2(pip2) 



dx 







(8.23) 



where x± are the solutions for x of the equation 1 + 6 = |px|. Further evaluation finally leads to 



x_ 



11 / 2p»p°(26 + 5^) 



x+ — x_ 

Notice that there is a smooth limit to the case in which pi and p2 are back-to-back: 

1 



(8.24) 



(8.25) 



The algorithm to generate k can be derived by reading the evaluations of the integrals backwards. 

Because k and k are back-to-back, they can serve as the incoming momenta. To fix them to 
Co + 63 and eo — 63, the whole system of momenta can be rotated. If we generate momenta with 
the density A^'^^, use the first two momenta to generate the incoming momenta and rotate, we get 
a density 



Ds({p)n) = 



d4-qAr({q}n]d4k 



1 



^(k)5(kO-r 



= Ar({pV]i5(pi,P2: 



-1 



27tl6(qi,q2) (qik)5(q2k]5 

(27t)-i 



(8.26) 



where we used the fact that the whole expression is invariant under rotations, and that these are 
orthogonal transformations. The last line of the previous expression can be evaluated further 
with the result that 



Ds({p}n) = Ar({p}n) 



h{v^,V2) ^ 



(PlP0)6(P0P2)6 

The algorithm to generate the incoming momenta is given by 

Algorithm 8.4.1 (INCOMING MOMENTA) 
1 . given a pair {p 1 , P2}, calculate x+ and x_; 



with po = eo + 63 , po = eo - 63 . (8.27) 



2. generate x in [0, 1] with density ~ [(x+ — x) (x — x_)] \ and put Px <— xpi + (x — 1 )p2 ; 
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3. generate cp uniformly in [0, 2n), z in [—1 , 1] with density ~ (1 + 6 — |px|z) ^ ; 

4. put ic <- iRpJn(z, cp) and k° <- 1 ; 

5. rotate all momenta with ; 

6. putpo <— ^-\/s (eo + 63) andpo <— |-\/s (eo- 63) . 

Notice that l6(pi,P2)(PiPo)6(PoP2)6 is invariant under the scaling pi,P2 — > cpi,cp2 with a 
constant c, so that scaling of po and po has no influence on the density. 

The pair (qi , q2) with which k is generated is free to choose because we want to symmetrize 
in the end anyway. We should only choose it such, that we get rid of the factor (qi q2) in the 
denominator of A2™({q}n)- 



8.4.2 Choosing the type of antenna with incoming momenta 

A density which is the sum over permutations can be obtained by generating random permuta- 
tions, and returning the generated momenta with permutated labels. This, however, only makes 
sense for the outgoing momenta. The incoming momenta are fixed, and should be returned sep- 
arately from the outgoing momenta by the event generator. Therefore, a part of the permutations 
has to be generated explicitly. There are two kinds of terms in the sum: those in which (poPo) 
appears, and those in which it does not. 

Case 1: antenna with [ipoVo)- To generate the first kind, we can choose a label i at random 
with weight (PtPi+i ({p}n) where I1 ({p}n) is the sum of all scalar products in the antenna ^: 

n 

Ii({p}n) := ]i[ViVi+^) ■ (8.28) 

i=1 

This is a proper weight, since all scalar products are positive. The total density gets this extra 
factor then, so that (ptpt+i ) cancels. The denominator of the weight factor does not give a 
problem, because its singular structure is much softer than the one of the antenna. The pair 
{pi, Pi+i} can then be used to generate the incoming momenta, as shown above. So in this case, 
a density ({p}rt)Bi ({p}n)/Ii ({p}n) is generated, where 

Bi({pk) := t_ ^^f . (8.29) 

^ (PiP0)6(P0Pi+l)6 

Case 2: antenna without (poPo]> To generate the second kind, we can choose two non-equal 
labels i and j with weight (piPi+i ) [pjPj+i ]/^2{{v}n], where 

TL 

l2({pln] := ^(piPi+i](PiPi+i) . (8.30) 



^Read i + 1 mod n when i + 1 occurs in this section 
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Next, a pair (k, I) of labels has to be chosen from the set of pairs 

:= + (1+1, j), (i+ 1,3 + 1]} . (8.31) 

If this is done with weight hiVk, Pi) /^tj ({p}n)> where 

Iij({p}n) := l6(Pic,Pi) , (8.32) 

(1c,l)G{(i,j)}+ 

then the density A2™({p}n)B2({p}n)/^2({p}n) is generated, where 

B2 {p}n = >_(PiPi+l PjPj+i) > ^ rr 1 ^ • 7 TT^ r 

tr ^^'^^^^^-^ (PicPo)6(PoPi)a 

y- (PiPi+l)(PjPj+l) II(k,l)g{(i,j)}+(PkPo)5(PoPl]5 

^ (PiPo)6(Pi+lPo)6(PoPj]6(PoPj+l)6 L(k,llel(t,in+ l6(Pk,P^^ 



Before all this, we first have to choose between the two cases, and the natural way to do this is 
with relative weights jsLi ({p}n) and l2({p}n)> so that the complete density is equal to 

^' i^^l (^P^^^ + ^2({p}n) 

where the first sum is over all permutations of ( 1 , . . . , n) . One can, of course, try to optimize the 
weights for the two cases using the adaptive multichannel method (Section 7.3.5). The result of 
using the sum of the two densities is that the factors (piPi+i ) in the numerator of Bi ({plrt) and 
(PiPi+i)(PjPj+i) in the numerator of B2({p}n) cancel with the same factors in the denominator 
of A^™({p}n), so that we get exactly the pole structure we want. The 'unwanted' singulari- 
ties in Bi ({pItt,], B2({p In) and ({pin), ^2({p)n] are much softer than the ones remaining in 
A^™({p}n), and cause no trouble. The algorithm to generate the incoming momenta and the 
permutation is given by 

Algorithm 8.4.2 (CHOOSE INCOMING POLE STRUCTURE) 

1. choose case 1 or 2 with relative weights ^sli ({pin) and l2({p}n) ; 

2. in case 1, choose ii with relative weight (pi, Pi^ +i ) and put 12 <— ii + 1 ; 

3. in case 2, choose (i, j ] with (i 7^ j ) and relative weight (PiPt+i ) (p jP j+i ) , and then 
choose [U , 12] from {(i, j )}_|_ with relative weight Is (pi, , Pij ) ; 

4. use {pi^ , ipi^} to generate the incoming momenta with Algorithm 8.4. 1 ; 

5. generate a random permutation a e and put Pi <— Pa(i) for all i = 1 , . . . , n. 

An algorithm to generate the random permutations can be found in [2]. An efficient algorithm to 
calculate a sum over permutations can be found in [36]. 



8.5 Improvements 
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When doing calculations with this algorithm on a PS, cut such that (pi + Pj)^ > Sq for all i ^ j 
and some reasonable Sq > 0, we notice that a very high percentage of the generated events does 
not pass the cuts. An important reason why this happens is that the cuts, generated by the choices 
of g (Eq. (8.9)) and £,,„ (Eq. (8.11)), are implemented only on quotients of scalar products that 
appear explicitly in the generation of the QCD-antenna: 

^t.^(Pl^ and £,^=7^^^ , i = 2,3...,n-l . (8.35) 

(Pi-lPn) (Pi-lPnJ 

The total number of these £,-variables is 

rL^:=2rL-4 , (8.36) 

and the cuts are implemented such that £,^^ < ^ 2 — f,m for i = 2, 3 . . . , n — 1 . We show now 
how these cuts can be implemented on all quotients 

(Pi-lPt) (Pi-lPi) . (PiPn) ■ ■ T o 1 /g -3-7X 

7 T, -f T 1 A' 1-J =2,3,... ,n- 1 . (8.37) 

We define the m-dimensional convex polytope 

:= {(xi , . . . , X J e [-1 , 1]^ I |x, - I < 1 V t, j = 1 , . . . , m} , (8.38) 

and replace the generation of the the variables by the following: 

Algorithm 8.5.1 (IMPROVEMENT) 

1. generate (xi , X2, . . . , x^j^] distributed uniformly in P^ij^; 

2. define xq := and put, 

^ g(X2l-3— X2l-4)log fm ^ g (X2l-2 — X21-4 ] log fjn -^9) 

for all i = 2, . . . ,ti — 1. 

Because all the variables Xt are distributed uniformly such that |xi — Xj| < 1, all quotients of 
(8.37) are distributed such that they are between and L,^. In terms of the variables Xt, this 
means that we generate the volume of P^j^, which is rtf^ + 1 , instead of the volume of [— 1 , 1]^'^, 
which is 2^^. In Section 8.8, we give the algorithm to generate variables distributed uniformly 
in P^. We have to note here that this improvement only makes sense because the algorithm to 
generate these variables is very efficient. The total density changes such, that the function Qn in 
Eq.(8.19) has to be replaced by 
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where the variables Xi are functions of the variables 2 as defined by (8.39). Because crude MC 
is used to restrict generated events to cut PS, again only the normalization has to be calculated 
for the weight of an event. 

With this improvement, still a large number of events does not pass the cuts. The situation 
with PS is depicted in Fig. 8.2. Phase space contains generated phase space which contains cut 
phase space. The problem is that most events fall in the shaded area, which is the piece of 
generated PS that is not contained in cut PS. To get a higher percentage of accepted events, we 
use a random variable £,v G [0, £,m], instead of the fixed number £,m, to generate the variables E,] 2- 
This means that the size of the generated PS becomes variable. If this is done with a probability 
distribution such that £,v can, in principle, become equal to E,^^, then whole of cut phase space is 
still covered. We suggest the following, tunable, density: 

H„(f,v) = ■ ^^^^^ ^{^<E.<U , oc>0 . (8.41) 

(log£,m)°''^t+^ 

If a = 0, then log£,v is distributed uniformly in [O,log£,ni], and for larger cx, the distribution 
peaks more and more towards f,v = £,m- Furthermore, the variable is easy to generate and the 
total generated density can be calculated exactly: g5T^(£,m; {Q] should be replaced by 



GP((X,£,^;{£,}] :- 



' ' dxx(«-^'-^ , (8.42) 



(log£,J«^^+i 



log ho 



where £,iow is the maximum of the ratios of scalar products in (8.37). 



8.6 Results and conclusions 

We compare SARGE with RAMBO in the integration of the SPHEL-integrand for processes of the 
kind gg — > ng, which is given by 

^ (P 1P2) (P2P3) (P3P4) ■ ■ ■ (PnPn+l ) (Pn+lPn+2) (Pn+2P 1 ) 



8.6 Results and conclusions 
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n 


4 


5 


6 


7 


Tsphel(s) 


5.40x10-5 


2.70x10-4 


1.80x10-3 


1.41 X 10-2 


Texact(s) 


3.07x10-1 


1.08 


3.35 


10.92 



Table 8.2: cpu-times (Tsphel) in seconds needed to evaluate the SPHEL-integrand one time with 
a 300-MHz UltraSPARC-IIi processor, and the cpu-times (Texact) needed to evaluate the exact 
integrand, estimated with the help of Tab. 8.1. 

where pi and p2 are the incoming momenta, and the first sum is over all permutations of 
(2,3,... ,n + 2] except the cyclic permutations. The results are presented in Tab. 8.3. 

The calculations were done at a CM-energy ^/s = 1 000 with cuts px = 40 on each transverse 
momentum and 9o = 30° on the angles between the momenta. We present the results for n = 

4,5,6, 7, calculated with RAMBO and SARGE with different values for ot (Eq. (8.42)). The value 
of a is the estimate of the integral at an estimated error of 1 % for n = 4, 5, 6 and 3% for 
n = 7. These numbers are only printed to show that different results are compatible. Remember 
that they are not the whole cross sections: flux factors, color factors, sums and averages over 
helicities, and coupling constants are not included. The other data are the number of generated 
events (Nge), the number of accepted events (Nac) that passed the cuts, the cpu-time consumed 
(tcpu), and the cpu-time the calculation would have consumed if the exact matrix element had 
been used (texa), both in hours. This final value is estimated with the help of Tab. 8.2 and the 
formula 

texa = tcpu + (^exact ~ Tsphel) > (8.44) 

where Texact and Tsphel are the cpu-times it takes to evaluate the squared matrix element once. 
Remember that the integrand only has to be evaluated for accepted events. The calculations have 
been performed with a single 300-MHz UltraSRA.RC-IIi processor. 

The first conclusion we can draw is that SARGE outperforms RAMBO in computing time 
for all processes. This is especially striking for lower number of outgoing momenta, and this 
behavior has a simple explanation: we kept the CM-energy and the cuts fixed, so that there is 
less energy to distribute over the momenta if n is larger, and the cuts become relatively tighter. 
As a result, RAMBO gains on SARGE if n becomes larger. This effect would not appear if the 
energy, or the cuts, would scale with n like in Tab. 8.1. Another indication for this effect is the 
fact that the ratio Nac/Nge for RAMBO, which estimates the ratio of the volumes of cut PS and 
whole PS, decreases with n. 

Another conclusion that can be drawn is that SARGE performs better if a is larger. Notice 
that the limit of a — > oo is equivalent with dropping the improvement of the algorithm using the 
variable £,v (Eq.(8.42)). Only if the integrand becomes too flat, as in the case of n = 7 with the 
energy and the cuts as given in the table, smaller values are preferable. Then, too many events 
do not pass the cuts if a is large. 

As an extra illustration of the performance of SARGE, we present in Fig. 8.3 the evaluation 
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gg^4g 
1 % error 



alg. 


RAMBO 


SARGE, OC = 0.0 


SARGE, OC = 0.5 


SARGE, a = 10.0 




4.30x10^ 


4.31 X 10^ 


4.37x10^ 


4.32x10^ 




4, 736, 672 


296, 050 


278, 702 


750, 8]6 


Nac 


3,065,227 


111,320 


40,910 


23,373 


tcpu(h] 


0.198 


0.0254 


0.0172 


0.0348 


texa(h) 


262 


9.52 


3.51 


2.03 



gg^ 5g 
1 % error 



alg. 


RAMBO 


SARGE, OC = 0.0 


SARGE, OC = 0.5 


SARGE, a = 10.0 


0" 


3.78x10^° 


3.81 xl0^° 


3.80x10^° 


3.81 xl0^° 




4,243,360 


715,585 


1,078,129 


6,119,125 


Nac 


1,712,518 


167,540 


36,385 


21,111 


tcpu(h] 


0.286 


0.133 


0.0758 


0.277 


texa(h) 


514 


51.6 


11.7 


9.10 



gg^ 6g 
1 % error 



alg. 


RAMBO 


SARGE, a = 0.0 


SARGE, a = 0.5 


SARGE, OC = 10.0 


a 


3.07x10^2 


3.05x10^2 


3.13x10^2 


3.05x10^2 


Nge 


3,423,981 


2,107,743 


6,136,375 


68,547,518 


Nac 


700,482 


276, 344 


34,095 


17,973 


tcpu(h] 


0.685 


1.32 


0.471 


3.17 


texa(h) 


653 


258 


32.2 


19.9 



gg^ 7g 
3% error 



alg. 


RAMBO 


SARGE, OC = 0.0 


SARGE, a = 0.5 


SARGE, a = 10.0 


a 


2.32x10^4 


2.16x10^4 


2.20x10^4 


2.28x10^4 


Nge 


605,514 


710,602 


5,078,153 


125,471,887 


Nac 


49,915 


42,394 


3,256 


1,789 


tcpu(h) 


0.224 


1.86 


0.452 


6.74 


texa(h) 


152 


130 


10.3 


12.2 



Table 8.3: Results for the integration of the SPHEL-integrand. The CM -energy and the cuts used 
are ^/s = 1000, px = 40 and 9o = 30°. Presented are the finial result (a), the number of 
generated (Nge) and accepted (Nac) events, the cpu-time (tcpu) in hours, and the cpu-time (tgxa) 
it would take to integrate the exact matrix element, estimated with the help of Tab. 8.2. In the 
calculation of this table, adaptive multichanneling in the two cases of Section 8.4.2 was used, 
and 6 = 0.01 (Section 8.4.1). 



8.6 Results and conclusions 
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Figure 8.3: The convergence of the MC-process in the integration of the SPHEL-integrand for 
n = 5, with ^/s = 1000, pj = 40 and 9o = 30°. The upper graphs show the integral itself as 
function of the number of accepted events, together with the estimated bounds on the expected 
deviations. The lower graphs show the relative error. SARGE was used with adaptive multi- 
channeling in the two cases of Section 8.4.2, with 6 = 0.01 (Section 8.4.1) and without the 
variable £,v. The number of generated events was 6, 699, 944, and the cpu-time was 0.308 hours. 
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of MC -integrals as function of the number of accepted events. Depicted are the integral O" with 
the bounds on the expected deviation coming from the estimated expected error, and the relative 
error. Especially the graphs with the relative error are illustrative, since they show that it con- 
verges to zero more smoothly for SARGE then for RAMBO. Notice the spike for RAMBO around 
Nac = 25000, where an event obviously hits a singularity. 

8.7 Other pole structures 

The APS of (8.1) is not the only pole structure occurring in the squared amplitudes of QCD- 
processes; not even in purely gluonic processes. For example, in the case of gg — > 4g, also 
permutations of 

1 

2 (8.45) 

(PlP3)(P2P4)(P0Pl)(P0P2)(P0-Pl -Vz) 

occur [35]. If one is able to generate momenta with this density, it can be included in the whole 
density with the use of the adaptive multichannel technique. In the interpretation of the transition 
amplitude as a sum of Feynman diagrams, this kind of pole structures typically come from t- 
channel diagrams, which are of the type 




and where, for this case, Qi = Pi + Ps and Q2 = Pi + P4. so that k = po — pi — P3. The 
natural way to generate a density with this pole structure is by generating S| := Q? with a density 
proportional to 1 /si, a variable t that plays the role of (po — Pi — Pa) , construct with this and 
some generated angles the momenta Qi, and then split new momenta from each of these. For 
n = 4, only two momenta have to split off each Qi, and there is a reasonable simple algorithm 
to generate these. 

We shall now just present the algorithm that generates the density (8.56), and then show 
its correctness using the UAF If we mention the generation of some random variable x 'with 
a density f(x)' in the following, we mean a density that is proportional to f(x), and we shall 
not always write down the normalization explicitly. Furthermore, s denotes the square of the 
CM-energy and A := A(s, si , $2) the usual Mandelstam variable 

A := + + $2 - 2ssi — 2ss2 — 2siS2 . (8.46) 

Of course, a cut has to be implemented in order to generate momenta following (8.45), and we 
shall be able to put (piPj) > jSq for the scalar products occurring in the denominator, where Sq 
only has to be larger than zero. To generate the momenta with density (8.45), one should 
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Algorithm 8.7.1 (T-CHANNEL) 

1. generate si and Sz between sq and s with density 1/si and 1 /si; 



2. generate t between s — Si — S2 ± •\/A(s, Si, Sa) with density 1 /[t(t + 2si ) (t + Isj)]', 

3. put z <— (s — Si — S2 — t)/\/A and generate cp uniformly in [0,2n); 



4. putQi <- (Vsi +A/(4s),yA/(4s)n(z,(p))andQ2^ ^/seo-Q1; 

5. for i = 1 , 2, generate Zi > 1 — 4so/ (t + 2si) with density 1/(1 — Zi) and cpi uniformly in 
[0,27r), and put qi <- |^(l,fL(zi, cpO ); 

6. for I = 1 , 2, rotate qi to the CMF of Qi, then boost it to the CMF of Qi + Q2 to obtain Pi, 
and put Pi+2 <- Qi - Pi; 

As a final step, the incoming momenta can be put to po <— [eo+es) andpo <— -^v^ (60—63). 
The variables Si and Zi can easily be obtained by inversion (Section 7.3.1). The variable t can 
best be obtained by generating x :— j log (4s 1 $2) — logt with the help of the rejection method 
(Section 7.3.3). In the UAF, the steps of the algorithm read as follows. Denoting 



and 



£1 := eo + 63 



nrm(s,Si,S2) := 



£2 := eo — 63 



dt 

t(t + 2si)(t + 2s2) 



h± := s — Si — $2 ± Vx , 



(8.47) 



e(H_ < t < H+) 



we have 
1. 

2. 
3. 
4. 



1/4 

Si - S2 



dsi ds2 0{so < S] 2 < s] 



1 , 1 +2s2/H_ 1 , 1 +2si/h_ 
log TT^S 7Z log 



S2 l+2s2/h,+ Si 1+2si/h+ 



(8.48) 



Si S2 (log^)^ 

dt e(H_ < t < h.+) 

t(t + 2si)(t + 2s2) nrm(s,si,S2) 
s — Si — S2 — 1\ dcp 



dz 6 z — 



d^Qi 6 (q? - ^Si+Aj 6M _ y^n(z, cp) ) d''Q2b\Q^ + Q2- v^eo) 



YZT- ,,,t+2s. ^ dV6(q°-^^)6^(qi-q?n(zi, Vi)) 



271 



Yl d%i b\loi - :Kq, £i] b\vi - ^q,K! qt) 6'(Pi+2 + Pi - Qi) 



i=1 
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The various assignments imply the following identities. First of all, we have 

(Pi + Pi+2)' = Q? = Si . (8.49) 
Using that 4ssi + A = (s + si — sz]^ we find 

V4s(£rQi) =s + si -S2-zVA = t + 2si (8.50) 
and the same for (1 <-> 2), so that 

t = 4(po-Qi)-2(pi+p3)^ = -2(po-pi-p3)^ . (8.51) 
Denote := CRi,i^Qi> so that qt = ^QiVi- Because £q. ~ £i, we find that 

_ 2(£TqO (£iXq,pO (£rpO 

so that 

(t + 2si)(l -zi) =8(po-pi) and (t + 2s2)(l - Z2) = 8(po-p2) • (8.53) 

We can conclude so far that the algorithm generates the correct pole structure. For the further 
evaluation of the integrals one can forget about the factors Si, t, t + 2si and 1 — Zi in the denom- 
inators. Using that 



dV6(q,"-|ViI)6nqi-qin(zi,(Pi)) = 2d\M^i)^'{^qi-n{zi,iPi]] , (8.54) 
and replacing step 4 by 

n2^si + A d4Q,^sJQi) j 6(z(Qi) - z) 6((p(Qi) - (p) S^fQi + Qi -V^eo) , (8.55) 

the integrals can easily be performed backwards, i.e., in the order q^, cpi, Zi, bi, Qi, cp, z, t, Si, 
S2. The density finally is 

e(2(poPl) > So] e(2(poP2) > So] e(2(piP3) > so] e(2(p2P4] > so) 

0s({p}4j : — 7^- 



X 



24(27t) 



(PoPi ) {V0V2] (P1P3) (P2P4] [-(Po - Pi - P3 
^(log^)^og(i±|^) log(l±|^)nrm(s,Si,S2) 



, (8.56) 



where Si := (pi + Pi+2] and t := -2(po - Pi - P3)' 
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8.8 Generating a uniform distribution inside a polytope 

We consider the m-dimensional convex polytope Frn defined in (8.38). The task is to generate 
m-dimensional points uniformly inside P^n- A straightforward way is to generate points inside 
the hypercube [—1 , 1]"^ and implement the other conditions by rejection, with an efficiency given 
by Vol(PTa)/2"^, where VoI(Pttv) is the volume of the polytope. This may, however, become slow 
if VoUPra) does not increase fast with m. Let us, therefore, compute Vol(PTa). We distinguish 
positive and negative Xi values. Define 



V^(k) := 



dxi • • • dxnie(xi,2,...,ic < 0) 9(Xic+l m > 0) 



(8.57) 



We then have 



Vol(P^) = Y_ 



ml 



lc=0 



k!(m-k)! 



V^(k) 



(8.58) 



In the calculation of Vralk) we notice that the only nontrivial constraints are of the type xi — Xj < 
1 , with i = k+ l,k + 2, ... ,m and j = 1 , 2, . . . , k. Writing Xi = —yi for i = 1 , 2, . . . , k, we 
therefore have 



(8.59) 



Vmlk) = dyi dv2 • • • dijicdxic+i dXk+2 • • • dx^ 9 maxxj + max Vi < 1 
Jo V j ^ 

Relabeling such that maxitii = y i and maxj Xj = x^a then leads us to 



V^(k) = k(m-k) 
= k(m-k] 



= k 
so that we find 



r1 






pi 


dyi 


dyi--- dy^ 


dx^ 


. 







J 


pi 








dyiyr 


dx x'^^''"' 


Jo 











y^r-'' = 


m — k) 



dXk+1 dx^_i 9 (x^ + y 1 < 1 ) 



(8.60) 







Vol(P J = m + 1 . (8.61) 

Accordingly, the rejection algorithm will quickly become inefficient, below 1% for n > 1 0. The 
above calculation actually allows us to construct an optimal algorithm by working backwards. In 
the following each pi stands for a new call to the random number source. 

Algorithm 8.8.1 (POLYTOPE) 

1. choose an integer k. Since ra!Vra(k)/k!(m — k)! = 1, it should be chosen uniformly in 
[0, m], so 

k<- [(m + l]poJ . 



148 



Generating QCD-antennas 



2. if k = we simply have 

Xi <— Pi i = 1 , m . 

If k = m we use 

Xi < Pi t = 1 , m . 

3. for < k < m, generate tji in [0, 1] according to the distribution n'j^^ (1 — y An 
efficient algorithm to do this is Cheng's rejection algorithm BA for beta random variates 
(cf. [5])^, but also the following works: 



c \ /m-k+l 

Vl^-l0g(nPij , V2^-l0g|^n Pi) ' yi^vi+v2 



4. generate x^a in [0, 1 — y i] according to the distribution ^ ^ . The algorithm to do this is 

x^^(l-yi)p]/(— . 



5. generate the yi,... ,ic uniformly in [0, y i] and flip sign: 

xi < yi , Xi< piyi ,1 = 2,3,... ,k . 

6. generate the x^+i ^^-i uniformly in [0, x^rJ : 

^) ^ Pj^m ,j=k+l,k + 2, ...,ra— 1 . 

7. Finally, perform a random permutation of the whole set of x values. 



8.8.1 Computational complexity 

The number usage S, that is, the expected number of calls to the random number source p per 
event can be derived easily. In the first place, 1 number is used to get k for every event. In 
a fraction 2/(m + 1 ) of the cases, only m calls are made. In the remaining cases, there are 
k + ( m — k + 1 ] = m + 1 calls to get y i , and 1 call for all the other x values . Finally, the simplest 
permutation algorithm calls m — 1 times [2]. The expected number of calls is therefore 

2m m — 1 , , , , , , , , , 3m'^ — m + 2 _ 

S = l+^^ + ^^ m+l+ m-1 +(m-l = — . (8.62) 

m+ 1 m+ I m+ 1 

For large m this comes to about 3m — 1 calls per event. Using a more sophisticated permutation 
algorithm would use at least 1 call, giving 

S = l +^^ + ^^^(m+l +(m-l) + (l))=2m . (8.63) 
m + I m + I 



^There is an error on page 438 of [5], where "V <— A ^U] (1 — Ui) ^" should be replaced by "V <— 
A-1 log[Ui(1 -Ui )-!]". 
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We observed that Cheng's rejection algorithm to obtain y i uses about 2 calls per event. Denoting 
this number by C the expected number of calls becomes 

ra+ I 

for the simple permutation algorithm, while the more sophisticated one would yield 

m^+(C + 2)m-C+1 

S = ; ra + C + 2 . (8.65) 

ra + 1 

We see that in all these cases the algorithm is uniformly efficient in the sense that the needed 
number of calls is simply proportional to the problem's complexity ra, as m becomes large. An 
ideal algorithm would of course still need m calls, while the straightforward rejection algorithm 
rather has S = vn2^/ (ra + 1 ] ~ 2"^ expected calls per event. 

In the testing of algorithms such as this one, it is useful to study expectation values of, and 
correlations between, the various Xi. Inserting either Xi or XiX, in the integral expression for 
V(P), we found after some algebra the following expectation values: 

E(xO=0 , E(x^)= + ^ , E(x,Xj)= 7 + ^ a^j) , (8.66) 

6(m+1j I2(ra+lj 

so that the correlation coefficient between two different x's is precisely 1/2 in all dimensions! 
This somewhat surprising fact allows for a simple but powerful check on the correctness of the 
algorithm's implementation. 

As an extra illustration of the efficiency, we present in Tab. 8.4 the cpu-time (tcpu) needed 
to generate 1 000 points in an ra-dimensional polytope, both with the algorithm just presented 
(OURALG) and the rejection method (REJECT). In the latter, we just 

1. put Xi <— 2pi — 1 for I = 1 , . . . , m; 

2. reject x if |Xi — Xj| > 1 for any i = 1 , . . . , m — 1 and j = i + 1 , . . . , ra. 

The computations were done using a single 300-MHz UltraSPARC-IIi processor, and the random 
number generator used was RANLUX on level 3. For ra = 2 and ra = 3, the rejection method is 
quicker, but from ra = 4 on, the cpu-time clearly grows linearly for OURALG and exponentially 
for the rejection method. 

8.8.2 Extension 

Let us, finally, comment on one possible extension of this algorithm. Suppose that the points x 
are distributed on the polytope Frn, but with an additional (unnormalized) density given by 

m 

F(x) =f|cos(^7rxi) , (8.67) 

i=1 



150 



Generating QCD-antennas 





tcpu ( 


sec) 


m 


OURALG 


REJECT 


2 


0.03 


0.01 


3 


0.03 


0.02 


4 


0.03 


0.04 


5 


0.04 


0.08 


6 


0.05 


0.17 


7 


0.06 


0.32 


8 


0.07 


0.67 


9 


0.08 


1.33 


10 


0.09 


2.76 



m 


OURALG 


REJECT 


11 


0.09 


5.15 


12 


0.10 


10.94 


13 


0.11 


21.71 


14 


0.12 


44.06 


15 


0.13 




16 


0.14 


169.65 


17 


0.15 


336.67 


18 


0.16 


671.46 


19 


0.17 


1383.33 


20 


0.18 


2744.82 



Table 8.4: The cpu-time (in seconds) needed to generate 1 000 points in P 



so that the density is suppressed near the edges. It is then still possible to compute Vralk] for 
this new density. Writing s(x) := sin(27rx) and c(x) := cos[\nx], we have 



V^(k] = k(m-k) 
= k(m-k) 







dXTTi,c(Xm.) 



k-1 



dxcfx' 



2 \ TTL 

7t 







ds(xTT^) s(x„^)™" 




k-1 



m— k— 1 



TV" 



2\™r(i + f)r(i + ^] 



r(i + ^] 



(8.68) 



where we used s := s(vi Therefore, a uniformly efficient algorithm can be constructed in this 
case as well, along the following lines. Using the Vm,(k], the relative weights for each k can be 
determined. Then s is generated as a |3 -distribution. The generation of the other Xi's involves 
only manipulations with sine and arcsine functions. Note that, for large ra, the weighted volume 
V(P J is 



k=0 



(f) 



k!(ra-k)! 



/Tt / 8 



m/2 



(8.69) 



so that a straightforward rejection algorithm would have number usage 



'8 fn- 



2\ Ta/2 



(8.70) 



and a correspondingly decreasing efficiency. 
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Summary 



Since particle physicists consider themselves scientists, they apply the scientific method in their 
exploration of nature, which means that they perform experiments and have a model that tries to 
reproduce the results of the experiments. Furthermore, as the scientific method demands, they 
make predictions derived from the model, which are believed to be testable in the (near) future. 
The model gives the best description of nature on the most fundamental level that is currently 
accessible by the experiments, and is called the Standard Model. It is based on the physical 
concept of quantum mechanical particles and the mathematical construction of quantum field 
theory. 

Quantum theories describe nature by the dynamics of states, and for a model of quantum 
particles, these are states of particles. At one time, this state with these particles is appropriate 
to a physical situation, and at another time, that state with those particles is appropriate. An 
important piece of information provided by the model comes from the transition probabilities, transition 
which give the probability to get, in a certain situation, from one certain particle state to another probabilities 
certain particle state. These probabilities are interpreted as the ratios of the number of times 
the different states should appear, starting from the same state every time. In an experiment, 
one particular state is prepared millions of times, and then it is counted how often it goes over 
into which other state. These numbers are then compared with the probabilities predicted by the 
model in order to check its validity. 

Conceptually, the connection between model and experiments with the help of the transition 
probabilities is easy. Practically, however, it is difficult. One of the difficulties lies in the fact 
that, among other things, the momenta of the particles belong to the characteristics of a state, momenta 
These include the information in which directions the particles are moving, and it is possible 
that states only differ in these directions. The types and numbers of particles involved may be 
exactly the same; if the momenta are just slightly different, the states are different. The problem 
is that it is impossible to interpret the probabilities as described before, if the number of different 
states is so large. If a state with a definite momentum configuration for the particles appears at 
all, it will appear at most one time, and the model predicts probability zero for the state to occur. 
One can only speak about a non-zero probability for the direction of a particle, if it concerns a 
certain (small) range of directions within which the particular direction is predicted to be. The 
set of all possible momentum configurations is a continuum, called phase space, and transition phase space 
probabilities are probability densities over phase space. 
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The solution to the problem is to consider states that only differ in momentum configuration 
as equivalent. In an experiment, this just means that such states should not be counted separately, 
but should be collected together. For the model, this means that the average of the transition 
probabilities over the momentum configurations, over phase space, has to be calculated, and has 
to be multiplied by the total magnitude, the volume, of phase space. This is an example of the 
mathematical procedure of integration. It asks for a measure on phase space, which is given by 
the model of quantum particles. The quantity of which the average has to be calculated, in this 
integrand case the probability density, is called the integrand. The integration problem is again difficult to 
solve, but its solution is accessible, especially with the help of computers. 

Monte Carlo A popular method to integrate an integrand over phase space is the Monte Carlo method, 
and the idea behind it is, again, simple; just do the same as the experimentalists. Take the 
points average over a (finite) number of momentum configurations, or phase space points, chosen at 
random, and hope that the result comes close to the exact average. Probability theory tells us 
that, if the points are chosen according to a uniform distribution, and their number becomes 
larger and larger, then the result converges to the exact average. To understand what is meant 
by 'chosen according to a uniform distribution', it is helpful to consider the process of choosing 
points in phase space as the delivery of points by phase space itself. If the points are distributed 
uniform Uniformly, then the probability for each region of phase space to deliver the following point is 
distribution proportional to the volume of that region; at each instance in the process, all regions should get a 
fair chance to deliver a point. The volumes are measured by the measure mentioned before. The 
number of points, needed to get a result that is as close to the exact average as demanded, can be 
derived from a formula for the expected deviation at each number of used points. This formula 
is supplied by probability theory, and shows an expected deviation which becomes smaller with 
larger number of used points. 

The Monte Carlo method almost always works. There are some restrictions on the integrand, 
but the number of degrees of freedom over which it has to be averaged, the number of dimensions, 
does not matter. The only drawback of the method is that it can be rather slow, because the 
number of phase space points often has to be large. In those cases, it pays to 'load the dice', and 
not to give all regions a fair chance to deliver points. 

The reason why the Monte Carlo method works is that enough information about the inte- 
grand is obtained to make a good estimate of its average. The points get distributed uniformly 
over phase space, so that information is obtained that is diverse enough for a trustworthy av- 
erage. However, if it is known for which regions of phase space the integrand shows its most 
diverse behavior, one would like to use more points in that region, and less in the less interesting 
importance rCglOUS. This can be achieved by giving a larger (than fair) probability to the interesting regions 
sampling to deliver points. These probabilities have to be known exactly, in order to compensate for the 
cheating when the average is calculated: points coming from uninteresting regions should get 
a higher weight, since less of them are used. This improvement of the Monte Carlo method is 
called importance sampling, and the second part of this thesis deals with an explicit application 
to specific kinds of transition probabilities. 
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Calculations of the kind described above are usually done with the help of computers, for 
which there are 'standard' algorithms to deliver, or generate, numbers between and 1 . These 
can be considered distributed 'as good as' randomly according to the uniform distribution. They 
are called (pseudo) random numbers, where the 'pseudo' represents the 'as good as' in the pre- random 
vious sentence. A computer cannot do things at random, but it can run algorithms that deliver ""'"^^'^^ 
randomly looking results, and for the purpose of Monte Carlo integration, this suffices. With the 
algorithms mentioned above, all other kind of random points in spaces, needed for the application 
of Monte Carlo method, have to be constructed, and in practice, this is almost always possible. 
Every problem of calculating an average with the Monte Carlo method has to be reduced to a 
problem of taking the average of a (complicated) integrand over many degrees of freedom that 
all range from to 1 . This is also the case for importance sampling, where one just chooses a 
smart integrand. So eventually, one always applies the ordinary Monte Carlo method on a space 
of variables between and 1 , called a hypercube, using configurations of random numbers, again hypercube 
called points. 

As noted before, the Monte Carlo method works because the points get distributed uniformly 
over the hypercube. However, random numbers will not necessarily deliver points that are dis- 
tributed as uniformly as possible over the hypercube, and there is room for improvement. This 
sounds confusing, but there is a difference between the uniform distribution in the probabilistic 
sense, with fair probabilities for all regions of the hypercube to deliver a point, and the uniformity 
of the distribution over the hypercube of a given set of points; one only uses the same words. In 
the first case, there are fair probabilities at each instance in the process, so that two following 
points can still get close together, and this is something one would like to avoid happening. If Quasi 
there is a region where a few points have shown up already, and another which is still empty, then '^onte cario 
it is time that the latter region delivers a point. As a result of this kind of 'fudging', the points are 
not chosen independently anymore, but one might need less of them for a good estimate of the 
average. This method is called the Quasi Monte Carlo method, and the points are called quasi 
random. 

The Quasi Monte Carlo method also has its drawbacks. First of all, it is easier to let a 
computer choose the points with fair chances than distributed as uniformly as possible. Secondly, 
the formula for the expected deviation of the result only works for the normal Monte Carlo 
method. So the Quasi Monte Carlo result may be better, you only do not know how much. There 
are formulas that can be used, and they ask for the rate of non-uniformity, the discrepancy, of the discrepancy 
set of used points. These formulae, however, are very complicated. 

One way to compare the normal and the Quasi Monte Carlo method is by calculating the 
probability for a set of points, consisting of random numbers, to have a certain discrepancy. If 
there is a large probability for the discrepancy to be equally small, compared with a quasi random 
set of points, then the two methods are equally good. If this probability is small, then one better 
uses the Quasi Monte Carlo method. The first part of this thesis is devoted to the calculation of 
such probability distributions. 



Samenvatting 



Aangezien deeltjesfysici zichzelf als wetenschappers beschouwen, hanteren zij de wetenschap- 
pelijke methode bij hun onderzoek van de natuur. Dit betekent dat ze experimenten doen en een 
model hebben dat de resultaten van de experimenten tracht te reproduceren. Bovendien doen ze 
voorspellingen aan de hand van het model, waarvan geloofd wordt dat ze verifieerbaar zijn in 
de (nabije) toekomst, zoals de wetenschappelijke methode veriangt. Dit model geeft de beste 
beschrijving van de natuur op het meest fundamentele niveau dat toegankelijk is met de huidige 
experimenten en wordt het Standaard Model genoemd. Het is gebaseerd op het fysische concept 
van quantummechanische deeltjes en het wiskundige formalisme van de quantumveldentheorie. 

Quantumtheorieen beschrijven de natuur met behulp van de dynamica van toestanden en 
voor een model van quantumdeeltjes zijn dit deeltjestoestanden. Op het ene ogenblik is deze 
toestand met deze deeltjes van toepassing op een fysische situatie en op een ander ogenblik 
die toestand met die deeltjes. Belangrijke informatie, die door het model geleverd wordt, komt 
van de overgangswaarschijnlijkheden, die de waarschijnlijkheid geven om, in een bepaalde sit- overgangs- 
uatie, van de ene toestand over te gaan in de andere toestand. Deze waarschijnlijkheden worden waarsMjn- 
geinterpreteerd als de verhoudingen van het aantal keren dat de verschillende toestanden zouden 'y^^'^^" 
verschijnen, wanneer er telkens met dezelfde toestand gestart wordt. In een experiment wordt 
een en dezelfde toestand miljoenen keren geprepareerd en dan wordt er geteld hoe vaak deze over 
gaat in welke andere toestanden. Deze getallen worden dan vergeleken met de door het model 
voorspelde waarschijnlijkheden, zodat zijn geldigheid nagegaan kan worden. 

Conceptueel is het verband tussen het model en de experimenten met behulp van de over- 
gangswaarschijnlijkheden gemakkelijk te leggen. Praktisch is het echter moeilijk. Een van de 
moeilijkheden ligt in het feit dat de impulsen van de deeltjes deel uit maken van de karakter- 
istieken van een toestand. Deze bevatten o.a. de richtingen in welke de deeltjes zich bewegen 
en het is mogelijk dat toestanden alleen verschillen in deze richtingen. De types en aantallen 
deeltjes mogen precies hetzelfde zijn; als de impulsen verschillen, dan verschillen de toestanden. 
Het probleem is dat het onmogelijk is om de waarschijnlijkheden te interpreteren zoals zojuist 
beschreven, als het aantal toestanden zo groot is. Als een toestand met een bepaalde impulscon- 
figuratie zich iiberhaupt voor doet, dan hoogstens een keer en het model voorspelt een kans gelijk 
aan nul dat hij zich voor doet. Men kan met betrekking tot de richting van een deeltje alleen over 
een kans spreken die niet nul is, als het een bepaald bereik van richtingen betreft waarbinnen 
de richting voorspeld wordt te liggen. De verzameling van alle mogelijke impulsconfiguraties 
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faseruimte is ccn continiium dat de faseruimte genoemd wordt, en de overgangswaarschijnlijkheden zijn 
waarschijnlijkheidsdichtheden op de faseruimte. 

De oplossing voor het probleem is het equivalent beschouwen van toestanden die alleen in 
impulsconfiguratie verschillen. In een experiment betekent dit eenvoudigweg dat zulke toes- 
tanden niet apart geteld moeten worden, maar bij elkaar genomen moeten worden. Voor het 
model betekent dit dat het gemiddelde van de overgangswaarschijnlijkheden over de faseruimte 
genomen moet worden, en vermenigvuldigd moet worden met de totale uitgebreidheid, het vol- 
ume, van de faseruimte. Dit is een voorbeeld van de wiskundige procedure van integratie. Er 
is een maat op de faseruimte voor nodig, die geleverd wordt door het model van quantumdeelt- 
jes. De grootheid van welke het gemiddelde uitgerekend moet worden, in dit geval de over- 
integrand gangswaarschijnlijkheid, wordt de integrand genoemd. Het integratieprobleem is weerom moeil- 
ijk op te lessen, maar de oplossing is bereikbaar, in het bijzonder met behulp van computers. 
Monte Carlo Een populaire methode om een integrand over de faseruimte te integreren is de Monte Carlo 
methode en de gedachte erachter is, weerom, eenvoudig; doe maar hetzelfde als de experimenta- 
toren. Neem het gemiddelde over een (eindig) aantal willekeurig gekozen impulsconfiguraties, 
punten ook wcl puntcn in de faseruimte genoemd, en hoop dat het resultaat dicht bij het exacte gemid- 
delde komt. De waarschijnlijkheidsleer vertelt ons dat het resultaat naar het exacte gemiddelde 
convergeert als de punten gekozen worden volgens een uniforme verdeling en hun aantal groter 
en groter wordt. Om te begrijpen wat er bedoeld wordt met 'gekozen worden volgens een uni- 
forme verdeling' is het nuttig om het proces van het kiezen van punten in de faseruimte te zien 
uniforme als hct Icvcrcn van punten door de faseruimte zelf. Als de punten uniform verdeeld zijn, dan 
verdeling is voor icdcr gcbicd van de faseruimte de waarschijnlijkheid om het volgende punt te leveren 
evenredig aan het volume van dat gebied: op ieder moment van het proces behoren alle gebieden 
een eerlijke kans te krijgen om een punt te leveren. De volumes worden gemeten met de eerder 
genoemde maat. Het aantal punten dat nodig is om een resultaat te verkrijgen dat dicht genoeg 
bij het exacte gemiddelde ligt kan afgeleid worden van een formule voor de verwachte afwijking 
na ieder aantal gebruikte punten. Deze formule komt uit de waarschijnlijkheidsleer en laat een 
verwachte afwijking zien die afneemt met het aantal gebruikte punten. 

De Monte Carlo methode werkt bijna altijd. Er zijn een paar restricties op de integrand, maar 
het aantal vrijheidsgraden waarover het gemiddelde genomen moet worden, het aantal dimensies, 
maakt niet uit. Het enige nadeel van de methode is dat hij nogal traag kan zijn, omdat het aantal 
benodigde punten vaak groot is. In die gevallen loont het zich om 'vals te spelen' en niet alle 
gebieden een eerlijke kans te geven. 

De reden waarom de Monte Carlo methode werkt is dat er voldoende informatie over de 
integrand wordt verkregen om een goede schatting van zijn gemiddelde te doen. De punten wor- 
den gelijkmatig over de faseruimte verdeeld, zodat de informatie afwisselend genoeg is voor een 
betrouwbaar gemiddelde. Echter, als het bekend is in welke gebieden van de faseruimte de inte- 
grand zijn meest afwisselende gedrag vertoont, dan zou men daar meer punten willen gebruiken 
importance dan in dc minder interessante gebieden. Dit kan bereikt worden door de interessante gebieden 
sampling een grotcrc (dan eerlijke) kans te geven om punten te leveren. Deze kansen moeten wel exact 
bekend zijn, zodat er gecompenseerd kan worden voor het 'vals spelen' wanneer het gemiddelde 
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uitgerekend wordt: punten uit de oninteressante gebieden moeten een hoger gewicht krijgen, 
want er worden er minder van gebruikt. Deze verbetering van de Monte Carlo methode wordt 
importance sampling genoemd en het tweede deel van dit proefschrift behandelt een expliciete 
toepassing op een specifiek soort overgangswaarschijnlijkheden. 



Het boven beschreven soort van berekeningen wordt gewoonlijk gedaan met behulp van com- 
puters, waarvoor 'standaard' algorithmes bestaan die getallen tussen en 1 leveren. Deze kun- 
nen beschouwd worden als 'zo goed als' willekeurig verdeeld volgens de uniforme verdeling. Ze 
worden (pseudo) toevals getallen genoemd, waar het 'pseudo' het 'zo goed als' in de vorige zin toevais- 
representeert. Een computer kan geen willekeurige dingen doen, maar hij kan wel algorithmes 
uitvoeren die resultaten leveren die er willekeurig uit zien en voor het doel van Monte Carlo inte- 
gratie voldoen. Met de bovengenoemde algorithmes moeten alle andere soorten van willekeurige 
punten in ruimtes, benodigd voor de toepassing van de Monte Carlo methode, geconstrueerd 
worden en dit is in de praktijk bijna altijd mogelijk. leder probleem van de berekening van een 
gemiddelde met behulp van de Monte Carlo methode moet gereduceerd worden tot het nemen 
van het gemiddelde van een (ingewikkelde) integrand over veel vrijheidsgraden, die allemaal 
lopen van tot 1 . Dit is ook het geval voor importance sampling, waarbij men enkel een slimme 
integrand kiest. Dus uiteindelijk past men altijd de gewone Monte Carlo methode toe op een 
ruimte van variabelen die lopen van tot 1 , een hyperkubus genoemd, waarbij configuraties van hyperkubus 
toevalsgetallen, weerom punten genoemd, gebruikt worden. 

Zoals zojuist beschreven werkt de Monte Carlo methode, omdat de punten gelijkmatig over 
de hyperkubus verdeeld worden. Toevalsgetallen geven echter niet noodzakelijk de meest geli- 
jkmatige verdeling die mogelijk is en er is ruimte voor verbetering ^. Met de toevalsgetallen zijn 
er gelijke kansen op ieder ogenblik in het proces en kan het gebeuren dat twee opeenvolgende 
punten vlak bij elkaar komen te liggen, wat men zou willen proberen te verhinderen. Als er een Quasi 
gebied is waar reeds enige punten zijn verschenen en een ander waar er nog geen zijn, dan wordt florae Carlo 
het tijd dat dit laatste gebied een punt levert. Als resultaat van dit 'geknoei' worden de punten 
niet meer onafhankelijk van elkaar gekozen, maar zijn er mogelijk minder nodig voor een goede 
schatting van het gemiddelde. Deze methode wordt de Quasi Monte Carlo methode genoemd en 
de punten worden quasi toevallig genoemd. 

De Quasi Monte Carlo methode heeft ook zijn nadelen. Ten eerste is het gemakkelijker om 
een computer punten te laten kiezen met gelijke kansen dan zo gelijkmatig mogelijk verdeeld. 
Ten tweede werkt de formule voor de verwachte afwijking alleen voor de gewone Monte Carlo 
methode. Dus het Quasi Monte Carlo resultaat mag dan wel beter zijn, je weet alleen niet ho- 
eveel. Er zijn formules die wel gebruikt kunnen worden en ze vragen naar de mate van niet- 
gelijkmatigheid, de discrepantie, van de verzameling van gebruikte punten. Deze formules zijn discrepantie 
echter erg ingewikkeld. 

Een manier om de normale en de Quasi Monte Carlo methode te vergelijken is door de 
waarschijnlijkheid uit te rekenen dat een verzameling van punten, bestaande uit toevalsgetallen, 

"^In het Engels kan hierover verwarring ontstaan, omdat voor het woord 'gehjkmatig' ook het woord 'uniformly' 
gebruikt wordt. 
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een bepaalde discrepantie vertoont. Als er een grote kans is voor de discrepantie om even klein 
te zijn als voor een quasi toevallige verzameling, dan zijn de twee methoden even goed. Als deze 
kans klein is, dan kan de Quasi Monte Carlo methode beter gebruikt worden. Het eerst gedeelte 
van dit proefschrift is gewijd aan de berekening van zulke waarschijnlijkheidsverdelingen. 
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